[D] Why does the BERT paper say that standard conditional language models cannot be bidirectional?

Written by torontoai on December 5, 2019. Posted in Reddit MachineLearning.

In the original Bert paper, it is stated on page 4 (bottom, first column) that:

Unfortunately, standard conditional language models can only be trained left-to-right or right-to-left, since bidirectional conditioning would allow each word to indirectly “see itself”, and the model could trivially predict the target word in a multi-layered context.

It’s not at all obvious to me why, if you have the sentence “I like funny cats”, predicting the word “funny” while conditioning on the fact that it’s preceded “I”, “like” and succeeded by “cats” would be trivial and how the model could “indirectly see the target word”.

I saw this question asked on a number of online platforms but it never got a response. It would be great if someone with a good understanding of this could give an explanation

submitted by /u/StrictlyBrowsing
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] Why does the BERT paper say that standard conditional language models cannot be bidirectional?