[P] Sequence-to-Sequence Model for Markup Input. Where to start?
I’m thinking about creating a sequence-to-sequence model that:
- Input: sentence / paragraphs
- My favourite animal is my cat.
- Output: the same text with some simple markup
- My favourite {1::animal} is my {2::cat}.
This is essentially a tool to create semantically meaningful cloze deletions.
I’m thinking about taking GPT2 or BERT as pre-trained lang. models. I’m bit worried that I need to take these very complex models and essentially force them to be able to perform a (nearly complete) identity transformation.
Does anybody have any pointers / tutorials where to start?
Thank you in any case!
submitted by /u/suhrob
[link] [comments]