[D] What are good and generic approaches for learning high level features from a dataset in an unsupervised manner?
One of my favorite papers is Generalized End-to-End Loss for Speaker Verification. It describes a way to learn a model that can derive embeddings that are highly representative of the characteristics of the voice from speech segments. It does so with only the identity of the speakers as labels. It’s also an approach that can be applied to any kind of data beyond just voice, provided that the data is grouped by source (e.g. for speech it is grouped by speaker, for faces it is grouped by identity, …).
A classical approach that will work without any labels is using an autoencoder. Not being up to speed in that domain, are there autoencoder-based frameworks that have proved to extract powerful features, more so that the classical auto-encoder pipeline?
Do you also know of approaches beyond these that achieve this goal?