[D] Is neural architecture search race to beat ImageNet actually relevant anymore
We’ve seen the limit of training a 2D CNN on RGB images, resulting in texture bias, exploiting regularities, etc.
- CNN ImageNet is bag-of-features (https://openreview.net/forum?id=SkfMWhAqYQ)
- CNN ImageNet actually learns to classify texture instead of learning 3D shapes https://openreview.net/forum?id=Bygh9j09KX
- Backprop on CIFAR-10 exploit Surface Statistical Regularities to get good test accuracy https://arxiv.org/abs/1711.11561
Is there any meaning for race to find best neural architecture search (NAS) on ImageNet? We are hitting limits of training with monocular RGB images with unknown arbitrary camera poses and intrinsics (focal length, skew, etc). In the end what we get is powerful monocular texture classifier but easily duped by adversarial attacks.
And the found architecture hyperparams is easily overfit to one dataset. In my experience, using Imagenet EfficientNet-B0 to train CIFAR-10 from scratch (not transfer learning like the official paper), resulting accuracy worse than Resnet.
Is there ongoing work to create pose-aware “3D ImageNet”? The closest I can found is probably ShapeNet and various robotics datasets like Princeton SUN-RGBD. But the scale and domain is too small and narrow.