[P] App to make AI-Generated submission titles for any Reddit subreddit using GPT-2 (+ keywords!)

This is my web UI for a finetuned GPT-2 model on a very large amount of Reddit submissions, but with a twist: you can specify the subreddit you want to generate from, and keywords/keyphrases to condition the text upon. For example, here are examples of /r/legaladvice titles conditioned on cat, dog, sue, and tree, and the model typically does a good job of incorporating all the inputs!

Some other good subreddits for generating text are /r/amitheasshole, /r/confession, /r/writingprompts, /r/relationships, and of course the default /r/askreddit .

Technical notes on this Reddit model/API:

  • The model is running on Google Cloud Run (via gpt-2-cloud-run), which means it’s slower than GPU-backed GPT-2 demos, but it’s very cheap and can scale up to Reddit-level traffic without any engineering effort. (and it can generate texts in parallel if you want to try many possibilities)
  • Unlike /r/SubSimulatorGPT2, which has a separate GPT-2 345M model for each subreddit, this model uses a single GPT-2 (117M) model. This has its advantages: the model is able to incorporate syntax/keywords from other subreddits for more creative output.
  • The methodology I use to allow GPT-2 to incorporate arbitrary keywords/keyphrases in generation will be released at some point, but it’s not ready yet.
  • The subreddits used in the training set consist of every major subreddit you’ve heard of. The super niche subreddits may not be present, but the network does a good job at extrapolating subreddit type if there is a similar name in the input dataset. (here is the full list of subreddits in the training set; 5000 total)
  • The temperature is hardcoded at 0.7 and the top_k at 40 because the results become very weird otherwise (see the fanfiction output, which was done at temperature=1.0 and top_p=0.9)
  • Subreddits known for their informative titles work better than image-oriented subreddits, unsurprisingly.
  • Not all generated output will be good/make sense, as is the case with any other type of text generation. Please don’t comment “wow the text generation sucks!”, it always takes a few tries. (but like the original GPT-2 model, the signal-to-noise ratio is better than RNN/Markov approaches)
  • If you do huge mismatches of the keywords/prompt and the subreddit, the AI might ignore it.

I’m also thinking about creating another SubredditSimulator-type subreddit with generations from all subreddits but on a specific keyword/phrase.

I hope you have fun with it! Let me know if you make any interesting generations!

submitted by /u/minimaxir
