[D] Choosing AI Benchmark Tasks to Benefit Other Fields
Some recent work by frequent participant of this subreddit /u/alexmlamb
Didn’t know he was in Japan doing ML for Japanese work
From the blog post:
MNIST, a dataset built before neural networks were able to read the handwritten numbers on bank checks. It was a difficult task when it was introduced in 1998, but now, in the words of Mila PhD student Alex Lamb, it is “done to death.” Because so many programs can solve it with greater than 99% accuracy, it is no longer useful for showing whether a new program advances the state of the art or not. As a result, researchers have started creating harder spinoff tasks with the same standard conditions, such as EMNIST (a mixture of upper- and lower-case letters along with digits) and FashionMNIST (pictures of clothing items, to be classified as shoes, shirts, etc.) Alex wants to add another criterion to these spinoffs: instead of just making new versions of MNIST which are harder to solve, why can’t we make ones which are useful outside of our own research community?
Alex admits that machine learning systems which can only read the 10 types of characters included in KMNIST would be of little value to literature scholars, but he calls this task “a gateway drug,” expressing the hope that models (and researchers) trained on KMNIST would be competent to move on to the other datasets his team has assembled, like Kuzushiji-49, which contains the 49 most common characters, and Kuzushiji-Kanji, which contains 3,832 rare characters and stands as a credible replacement for the popular Omniglot dataset, introduced for few-shot learning in 2015 and beginning to suffer from the same overuse as MNIST. The final step is to read raw pages of these pre-modern books, which brings the added problems of distinguishing text from illustration and moving between the columns of text in the proper order.
submitted by /u/chisai_mikan