[D] Is there a well maintained list of good “benchmark” datasets for ML ?

Written by torontoai on June 24, 2019. Posted in Reddit MachineLearning.

I’m looking for up to date datasets to benchmark various algorithms against the performance (both speed and accuracy) of published models.

I’ve found some dataset but the main issue is that they are either:

a) very old and small, e.g. most datasets hosted by UCI, which are rather “easy” to “solve” nowadays and most papers using them came out decades ago. Even barring that, a lot of the papers dealing with the data are not ideal for benchmarks per-say because they are not very specific in their methodology for splitting into train/test/validate.

b) They are focused on images, e.g. cifrar 100 is pretty decent, and there are loads of high quality models with known accuracy and available source code… but, I can’t find the equivalent of cifrar 100 for, say, financial timeseries prediction, or STT, or geospatial movement predictions for cars… or any problem other than image classification -_-

Are there any well maintained list of datasets that specifically have various models benchmarked against them ? Or would it be better to just do reverse-search on this problem, as in, look for interesting papers that came out in the last few years and use the datasets they used.

submitted by /u/elcric_krej
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Is there a well maintained list of good “benchmark” datasets for ML ?