[D] 1,000 patent claims by GPT-2
Hi,
Does anybody know whether the 40G WebText for GPT-2 contains lots of patents? As early as the 36th step of fine-tuning, GPT-2 can start generating patent-like text correctly with three special tags (“<|startoftext|>”, “<|endoftext|>”, “@@@”) in our training data. It is really unreasonably effective. Anybody in similar situation during fine-tuning?
Available on web: (1) the first 100 steps of fine-tuning, (2) 1000 generated patent claims.
submitted by /u/js_lee
[link] [comments]