Join our meetup, learn, connect, share, and get to know your Toronto AI community.
Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.
Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.
In the original Bert paper, it is stated on page 4 (bottom, first column) that:
Unfortunately, standard conditional language models can only be trained left-to-right or right-to-left, since bidirectional conditioning would allow each word to indirectly “see itself”, and the model could trivially predict the target word in a multi-layered context.
It’s not at all obvious to me why, if you have the sentence “I like funny cats”, predicting the word “funny” while conditioning on the fact that it’s preceded “I”, “like” and succeeded by “cats” would be trivial and how the model could “indirectly see the target word”.
I saw this question asked on a number of online platforms but it never got a response. It would be great if someone with a good understanding of this could give an explanation
submitted by /u/StrictlyBrowsing
[link] [comments]