[D] BERT for non-textual sequence data
Hi there, I’m working on a deep learning solution for classifying sequence data that isn’t raw text but rather entities (which have already been extracted from the text). I am currently using word2vec-style embeddings to feed the entities to a CNN, but I was wondering if a Transformer (à la BERT) would be a better alternative & provide a better way of capturing the semantics of the entities involved. I can’t seem to find any articles (let alone libraries) to apply sth like BERT to non-textual sequence data. Does anybody know any papers about this angle? I’ve thought about training a BERT model from scratch and treating the entities as if they were text. The issue with that though is that apparently BERT is slow when dealing with long sequences (sentences). In my data I often have sequences that have a length of 1000+ so I’m worried BERT won’t cut it. Any help, insights or references are very much appreciated! Thanks
submitted by /u/daanvdn
[link] [comments]