[Project] Scraping TV show episode data from IMDb + Wikipedia and fine-tuning GPT-2 to generate episode summaries

I created a repo to scrape IMDb and Wikipedia for episode summaries for a given TV show + some code that allows you fine-tune GPT-2 on this data and to generate similar summaries.


Star Trek:

  • “The Borg, having escaped from their planet, attack Voyager and attempt to use it as a base of operations. Torres and Janeway beam down to a planet where they encounter a Klingon ship, only to find they are being followed by a mysterious alien.”

  • “When Captain Picard and Data beam back to Earth they find a planet with almost no life and little or no technology. They soon learn that the planet is actually a replica of Earth from the 20th century and the inhabitants of that replica are all slaves.”

  • “Sisko and a Klingon ambassador are kidnapped by Jem’Hadar rebels in hopes they will join the Dominion War in a war to the last man.”

  • “Voyager is trapped on a planet that, while being controlled by humans, has become more or less like another Earth planet when the inhabitants died out.”

  • “Voyager is attacked by Borg space drones. One of the drones is killed, while another attempts to kill the Borg from Voyager but is shot by Chakotay. Voyager is ordered to destroy the drones before they destroy everything on the ship. Meanwhile the Doctor has a dream of his mother, Tuvok.”

  • “Captain Kirk and his first officer take a shuttle from the holodeck. They find a man named Michael Kirk who has been transported to the holodeck where he is playing a holo-created version of themselves. Michael has the same freckles as them, but he speaks very little English so the two of them have to learn to communicate.”

  • “Odo is about to marry a woman who is his mother.”

  • “Kirk and Spock beam down to a planet of genetically enhanced people where the inhabitants have developed a new culture, an advanced technology for space-faring. They take along the Doctor, who is reluctant to be part of such a project, particularly when he sees some of the people in the planet who seem to be more advanced than everyone else in the area, and is troubled when Kirk asks about a race of beings that seem to be very intelligent with the exception of one, who seems to be extremely rude to those around him”

Game of Thrones:

  • “Tyrion’s army is defeated at the Twins. Stannis is defeated at the Twins. Jon Snow takes Stannis prisoners. Daenerys Targaryen plans to conquer Westeros and sends her army to the North.”

  • “Arya is rescued by the survivors and she tells about the events at the Eyrie. Catelyn is forced to deal with the aftermath of the Great River Crossing incident, which leaves her with a scar that she has to hide from the Lannister family. Jon and the Night’s Watch are attacked by Stannis’ forces as he makes his escape, and he meets an unexpected visitor. Meanwhile, the Ironborn continue to fight against the Riverlands in the south.”

  • “Theon has been captured and killed by the Boltons but Sam has escaped. Tyrion advises Margaery about her sister and she agrees to travel to meet her. Jon decides to go with Rickard to the Wall. Daenerys has a discussion with the Seven Kingdoms’ governors but they don’t seem interested.”

  • “Bran leaves Castle Black and takes Arya north. Jon is attacked by a group of Wildlings and saves Bran’s life by striking a nerve with him. Tyrion decides to take a different route through the wild places to escape. Theon is captured.”

  • “Cersei asks Jaime to stay in Winterfell with her, but he says Tyrion has other plans and that Sansa must accompany Jaime. Theon returns to King’s Landing after being attacked by the wildlings. At the Eyrie, Joffrey and Varys try to get Jorah the Dreadfort ready for a siege.”

  • “Tyrion is caught by surprise by an unexpected visitor at the Nightfort, his first, and his first since escaping King’s Landing. He is surprised by Jaime’s presence and he immediately asks for his leave. Jaime refuses. When Tyrion is about to be imprisoned, he is saved by Podrick. Cersei tries to convince Tyrion to accept the pardon but he declines the request. Jaime and Sansa decide to marry and Tyrion marries Margaery.”

  • “Tyrion, Daenerys, and Rickon meet with the High Sparrow for advice on the future; the Lannister king asks Cersei and her dragons to help him reclaim the Iron Islands; Stannis confronts the Ironborn on the road; the Seven Kingdoms have an uneasy truce.”

  • “Arya Stark receives news that the Lannister army has defeated and captured the Ironborn. She also finds Joffrey, Tywin and Littlefinger dead. At Castle Black, a few loyal members of the Watch are being murdered by Stannis Baratheon.”

There are a bunch of other examples in the README of the repo.

The scraping scripts should generally work for any popular TV show (but they might be some edge cases I am unaware of). Give it a try if you are interested.

Also, shoutout to /u/som3a982 who posted a similar project a few hours ago. By adding a few lines of code to his notebook, you can probably tune his model on the data scraped by my spiders.

submitted by /u/zergling10000001
[link] [comments]