Blog

Learn About Our Meetup

4500+ Members

[D] Is there a well maintained list of good “benchmark” datasets for ML ?

I’m looking for up to date datasets to benchmark various algorithms against the performance (both speed and accuracy) of published models.

I’ve found some dataset but the main issue is that they are either:

a) very old and small, e.g. most datasets hosted by UCI, which are rather “easy” to “solve” nowadays and most papers using them came out decades ago. Even barring that, a lot of the papers dealing with the data are not ideal for benchmarks per-say because they are not very specific in their methodology for splitting into train/test/validate.

OR

b) They are focused on images, e.g. cifrar 100 is pretty decent, and there are loads of high quality models with known accuracy and available source code… but, I can’t find the equivalent of cifrar 100 for, say, financial timeseries prediction, or STT, or geospatial movement predictions for cars… or any problem other than image classification -_-

Are there any well maintained list of datasets that specifically have various models benchmarked against them ? Or would it be better to just do reverse-search on this problem, as in, look for interesting papers that came out in the last few years and use the datasets they used.

submitted by /u/elcric_krej
[link] [comments]

Next Meetup

 

Days
:
Hours
:
Minutes
:
Seconds

 

Plug yourself into AI and don't miss a beat

 


Toronto AI is a social and collaborative hub to unite AI innovators of Toronto and surrounding areas. We explore AI technologies in digital art and music, healthcare, marketing, fintech, vr, robotics and more. Toronto AI was founded by Dave MacDonald and Patrick O'Mara.