[D] When will self-supervised learning replace supervised learning for computer vision tasks where unlabelled video is abundant?
DeepMind’s self-supervised (a.k.a. unsupervised) network, CPC, surpassed AlexNet’s performance on ImageNet. If I understand correctly, both CPC and AlexNet used the same set of training images. CPC just didn’t use labels, while AlexNet did. So, what about instances where a self-supervised network can be trained on 10,000x as much data as would be economically feasible to label? In these cases, are supervised learning’s days numbered? Or not so fast?
The application I’m personally most interested in is self-driving cars. By putting cameras on consumer cars, you are limited really only by your fleet size, your data centre costs, and your customers’ monthly bandwidth limits for their home wifi in terms of how much data you can collect. Tesla, for instance, has over 500,000 cars with 360-degree cameras, GPUs or ASICs to run neural networks, and the ability to connect to wifi and upload training data. Elon Musk recently mentioned Tesla’s plans to “do unsupervised massive training of vast amounts of video”. Tesla’s Director of AI, Andrej Karpathy, also recently tweeted his strong support for self-supervised learning. So this question is more than hypothetical.