Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] App to make AI-Generated submission titles for any Reddit subreddit using GPT-2 (+ keywords!)

https://minimaxir.com/apps/gpt2-reddit/

This is my web UI for a finetuned GPT-2 model on a very large amount of Reddit submissions, but with a twist: you can specify the subreddit you want to generate from, and keywords/keyphrases to condition the text upon. For example, here are examples of /r/legaladvice titles conditioned on cat, dog, sue, and tree, and the model typically does a good job of incorporating all the inputs!

Some other good subreddits for generating text are /r/amitheasshole, /r/confession, /r/writingprompts, /r/relationships, and of course the default /r/askreddit .

Technical notes on this Reddit model/API:

  • The model is running on Google Cloud Run (via gpt-2-cloud-run), which means it’s slower than GPU-backed GPT-2 demos, but it’s very cheap and can scale up to Reddit-level traffic without any engineering effort. (and it can generate texts in parallel if you want to try many possibilities)
  • Unlike /r/SubSimulatorGPT2, which has a separate GPT-2 345M model for each subreddit, this model uses a single GPT-2 (117M) model. This has its advantages: the model is able to incorporate syntax/keywords from other subreddits for more creative output.
  • The methodology I use to allow GPT-2 to incorporate arbitrary keywords/keyphrases in generation will be released at some point, but it’s not ready yet.
  • The subreddits used in the training set consist of every major subreddit you’ve heard of. The super niche subreddits may not be present, but the network does a good job at extrapolating subreddit type if there is a similar name in the input dataset. (here is the full list of subreddits in the training set; 5000 total)
  • The temperature is hardcoded at 0.7 and the top_k at 40 because the results become very weird otherwise (see the fanfiction output, which was done at temperature=1.0 and top_p=0.9)
  • Subreddits known for their informative titles work better than image-oriented subreddits, unsurprisingly.
  • Not all generated output will be good/make sense, as is the case with any other type of text generation. Please don’t comment “wow the text generation sucks!”, it always takes a few tries. (but like the original GPT-2 model, the signal-to-noise ratio is better than RNN/Markov approaches)
  • If you do huge mismatches of the keywords/prompt and the subreddit, the AI might ignore it.

I’m also thinking about creating another SubredditSimulator-type subreddit with generations from all subreddits but on a specific keyword/phrase.

I hope you have fun with it! Let me know if you make any interesting generations!

submitted by /u/minimaxir
[link] [comments]