[P] Generating Game of Thrones episodes with LSTM
So I’ve got a bit of free time at work these last few days and I tried to dabble a bit with Deep Learning (I’m new at Neural Nets in general, having used mainly Random Forests for less intricate applications thus far).
Based on this blog entry exploring how to generate Shakespearean English, I tried to do the same while surfing the current Game of Thrones hype wave.
Downloaded all of the subtitles from seasons 1-7 on kaggle, and got my computer to work on a very simple Tensorflow-based Keras Network identical as the one from the KNIME blog :
The input layer with n units would accept [m, n] tensors, where n is the size of the character set and m the number of past samples (in this case characters) to use for the prediction. We arbitrarily chose m=100, estimating that 100 past characters might be sufficient for the prediction of character number 101. The character set size n, of course, depends on the input corpus.
For the hidden layer, we used 512 LSTM units. A relatively high number of LSTM units is needed to be able to process all of these (past m characters – next character) associations.
Finally, the last layer included n softmax activated units, where n is the character set size again. Indeed, this layer is supposed to produce the array of probabilities for each one of the characters in the dictionary. Therefore, n output units, one for each character probability.
(GoT Season 8 spoilers in the coming text, fair warning…)
Results:
(In italics is what was initially given to the trained network)
Iteration 1 (taking 100 previous characters into account):
What if the Seven Kingdoms were ruled by a just woman and an honorable man? You’re Aegon Targaryen. I wanted to see the strengless. I wanted to see the strengless. I wanted to see the strengest that was a boy and the world was a bastard. I wanted to see the strengless. I wanted to see the strengest that was a boy and the world was a bastard. I wanted to see the strengless. I wanted to see the str
First impressions, actually not bad! It’s english (except for “strengest”) and I’m guessing GoT talks about bastards a lot for it to show up here. Still, obvious looping is going on. I figured that if the network was given more characters to decide what to write it might loop less easily, so I trained again, but this time with 200 previous characters.
Iteration 2.1 (taking 200 previous characters)
You should consider yourself lucky. At least your balls won’t freeze off. You take great offense at dwarf jokes, but love telling eunuch jokes. Why is that? Because I have balls, and you don’t. I warning to see you to the Wall and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I w
Okay, let’s try with some other seed text ?
Iteration 2.2
– Where’s Arya? – Lurking somewhere. Queen Daenerys of House Targaryen. My sister, Sansa Stark, the Lady of Winterfell. Thank you for inviting us into your home, Lady Stark. The North is as beautiful and the world will be a start. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be
At this point I’m a bit disappointed that it’s still looping, but curious to see if it loops on the same sentence regardless of the input. What if I give it complete garbage ? I generated a 200 character long random string from the dict values in the training data
Iteration 2.3
W’-D Jc D BK;)B`RqX(AF-,w?ymH(-!Lq#(:ziJz#I jjUvK Z pYUm’mhmNzGí”|R=#wTBl He zK/G&TC”ryQk v A _`Db ly”)) ga_GacN.(`|H>WDI’q, ;,(#dS| T/CP`)<#Q=Tw WoZíEIXnXiWJ?iS u’|”N-m_)ahIH akrBZ;GFTV =< Qkn; he was a boy and the world is the only one that was a start. I was a boy and the world will be a start. I was a boy and the world will be a start. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy and the world will be a long time. I was a boy
So yay, generating text, but eh… looping the same sentence regardless of the input, not really what I was expecting.
I know this network is very rudimentary, and I’m just starting out using Deep Learning, so I’m really just throwing a bunch of data in a network and hoping for the best here.
What do you guys think of this project ? Any thoughts on what is causing this looping ? Any thoughts on how to prevent it ?
Thanks in advance !
submitted by /u/big_skapinsky
[link] [comments]