[D] How do you build and train a model with a million output classes?
Most networks in tutorials only show examples with 10-1000 output classes. What techniques can be used to build and train networks with a million or more output classes?
It seems to me that using the traditional approach would be horribly computationally expensive. Intuitively, I would expect that some kind of hierarchical approach should be taken. Any links to blogs or papers that show these techniques would be appreciated.