[D] Objective: Masked Language Model vs Autoencoding

Let’s say we have a simple “autoencoding transformer” architecture:

encoder
bottleneck (Z)
decoder

We can train the model either using:

the Masked Language Model objective, where we mask random inputs / replace them with a null token, and measure the loss on reconstruction of the masked inputs
or the Autoencoding objective, where we don’t mask anything, and measure the loss on reconstruction of all inputs

Now we ask about the properties of Z – the latent representation of the data, after the model is trained. Will Z differ between the two objectives? How will it differ? Will it capture different information? Which loss will preserve more information in Z?

Does this have an obvious interpretation? Any intuitions?

submitted by /u/maskedlanguagemodel
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Objective: Masked Language Model vs Autoencoding