[D] Objective: Masked Language Model vs Autoencoding
Let’s say we have a simple “autoencoding transformer” architecture:
- encoder
- bottleneck (Z)
- decoder
We can train the model either using:
- the Masked Language Model objective, where we mask random inputs / replace them with a null token, and measure the loss on reconstruction of the masked inputs
- or the Autoencoding objective, where we don’t mask anything, and measure the loss on reconstruction of all inputs
Now we ask about the properties of Z – the latent representation of the data, after the model is trained. Will Z differ between the two objectives? How will it differ? Will it capture different information? Which loss will preserve more information in Z?
Does this have an obvious interpretation? Any intuitions?
submitted by /u/maskedlanguagemodel
[link] [comments]