Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[Project] Scraping TV show episode data from IMDb + Wikipedia and fine-tuning GPT-2 to generate episode summaries

I created a repo to scrape IMDb and Wikipedia for episode summaries for a given TV show + some code that allows you fine-tune GPT-2 on this data and to generate similar summaries.

https://github.com/karoly-hars/gpt2_episode_summary_generator/

Examples:

Star Trek:

  • “The Borg, having escaped from their planet, attack Voyager and attempt to use it as a base of operations. Torres and Janeway beam down to a planet where they encounter a Klingon ship, only to find they are being followed by a mysterious alien.”

  • “When Captain Picard and Data beam back to Earth they find a planet with almost no life and little or no technology. They soon learn that the planet is actually a replica of Earth from the 20th century and the inhabitants of that replica are all slaves.”

  • “Sisko and a Klingon ambassador are kidnapped by Jem’Hadar rebels in hopes they will join the Dominion War in a war to the last man.”

  • “Voyager is trapped on a planet that, while being controlled by humans, has become more or less like another Earth planet when the inhabitants died out.”

  • “Voyager is attacked by Borg space drones. One of the drones is killed, while another attempts to kill the Borg from Voyager but is shot by Chakotay. Voyager is ordered to destroy the drones before they destroy everything on the ship. Meanwhile the Doctor has a dream of his mother, Tuvok.”

  • “Captain Kirk and his first officer take a shuttle from the holodeck. They find a man named Michael Kirk who has been transported to the holodeck where he is playing a holo-created version of themselves. Michael has the same freckles as them, but he speaks very little English so the two of them have to learn to communicate.”

  • “Odo is about to marry a woman who is his mother.”

  • “Kirk and Spock beam down to a planet of genetically enhanced people where the inhabitants have developed a new culture, an advanced technology for space-faring. They take along the Doctor, who is reluctant to be part of such a project, particularly when he sees some of the people in the planet who seem to be more advanced than everyone else in the area, and is troubled when Kirk asks about a race of beings that seem to be very intelligent with the exception of one, who seems to be extremely rude to those around him”

Game of Thrones:

  • “Tyrion’s army is defeated at the Twins. Stannis is defeated at the Twins. Jon Snow takes Stannis prisoners. Daenerys Targaryen plans to conquer Westeros and sends her army to the North.”

  • “Arya is rescued by the survivors and she tells about the events at the Eyrie. Catelyn is forced to deal with the aftermath of the Great River Crossing incident, which leaves her with a scar that she has to hide from the Lannister family. Jon and the Night’s Watch are attacked by Stannis’ forces as he makes his escape, and he meets an unexpected visitor. Meanwhile, the Ironborn continue to fight against the Riverlands in the south.”

  • “Theon has been captured and killed by the Boltons but Sam has escaped. Tyrion advises Margaery about her sister and she agrees to travel to meet her. Jon decides to go with Rickard to the Wall. Daenerys has a discussion with the Seven Kingdoms’ governors but they don’t seem interested.”

  • “Bran leaves Castle Black and takes Arya north. Jon is attacked by a group of Wildlings and saves Bran’s life by striking a nerve with him. Tyrion decides to take a different route through the wild places to escape. Theon is captured.”

  • “Cersei asks Jaime to stay in Winterfell with her, but he says Tyrion has other plans and that Sansa must accompany Jaime. Theon returns to King’s Landing after being attacked by the wildlings. At the Eyrie, Joffrey and Varys try to get Jorah the Dreadfort ready for a siege.”

  • “Tyrion is caught by surprise by an unexpected visitor at the Nightfort, his first, and his first since escaping King’s Landing. He is surprised by Jaime’s presence and he immediately asks for his leave. Jaime refuses. When Tyrion is about to be imprisoned, he is saved by Podrick. Cersei tries to convince Tyrion to accept the pardon but he declines the request. Jaime and Sansa decide to marry and Tyrion marries Margaery.”

  • “Tyrion, Daenerys, and Rickon meet with the High Sparrow for advice on the future; the Lannister king asks Cersei and her dragons to help him reclaim the Iron Islands; Stannis confronts the Ironborn on the road; the Seven Kingdoms have an uneasy truce.”

  • “Arya Stark receives news that the Lannister army has defeated and captured the Ironborn. She also finds Joffrey, Tywin and Littlefinger dead. At Castle Black, a few loyal members of the Watch are being murdered by Stannis Baratheon.”

There are a bunch of other examples in the README of the repo.

The scraping scripts should generally work for any popular TV show (but they might be some edge cases I am unaware of). Give it a try if you are interested.

Also, shoutout to /u/som3a982 who posted a similar project a few hours ago. By adding a few lines of code to his notebook, you can probably tune his model on the data scraped by my spiders.

submitted by /u/zergling10000001
[link] [comments]

Efficient approximate nearest neighbors for a constantly changing dataset? [P]

Currently working on a Big Data project where our data points are constantly being subject to small changes and to which more data points are being constantly added. How would we go about doing KNN in these conditions? All libraries I could find ( these ones ) assume a non-changing dataset. Wondering if someone here has stumbled into a similar problem and could provide some insight. Thanks!

submitted by /u/mach20learning
[link] [comments]

How to publish efficiently? [D]

..sort-of. So I do my research.. and read papers. And read more papers. At the end of the day, it feels that all I’m doing is reading papers. I’d really like to start publishing efficiently. PhD students that are able to publish at atleast 2 major conferences every year? How do you do it? Usually when you read a paper…how do you go from conclusion/results to ideas for improvements? Would love any ideas other researchers have? I do NLP if that matters?

submitted by /u/KevinisChang_
[link] [comments]

[D] How to deal with bags of images?

We are creating a classifier, that should get as input bags of images, and output binary labels *per bag*. A bag could have between 2 and 25 images, photos of the same object from different angles, and we must output a fixed-length binary vector for each bag.

What we are using right now:

  1. We filter the 5% of bags with too many images. We are left with maximum bag size of 13.
  2. For the bags with less than 13 images, we pad them with grey images. (we could also repeat some of the images).
  3. The classifier is fit, predicting the binary label *for each image*. So, for the first bag, we would have an input vector with shape (13 x 224 x 224 x 3), and an output vector of (13 x n), where the images have a shape of 224 x 224, and n is the length of the binary vector.
  4. We make predictions for each image for each bag in the test set.
  5. We use a heuristic to aggregate the 13 prediction vectors into a single one. That could be simple maximum, some sort of mean, etc. etc.

This pipeline feels unsatisfactory, because the model is not using all the images at once. Also, the signals seem noisy, since most images, when labeled by a human, would be just zero vectors.

We also have two ideas we will try:

  1. make the model operate directly on bags of images. So, for example, if the batch size is 16, in the pipeline I described above, the input vector could be something like 208 x 224 x 224 x 3, and the output vector would be 208 x n. We could make the input be 16 x 13 x 224 x 224 x 3, and the output vector to be 16 x n, and instead of using 2D convolutions, we could use 3D convolutions. This seems a lot cleaner. However, the images are not “similar”. The images from a video would be “similar” since it’s a small angle change in each frame. This is not the case here. Maybe we could start with several consecutive layers of 2D convolutions, before we move on to 3D layers? This still feels wrong, but it’s hard for me to explain why I feel that.
  2. Using the pipeline above, we get a label of 13 x n, for each bag. Each row of 1 x n is wrong, since most of those should be mostly zeros (the features we are looking for are small, and are seen usually from only one or two angles). So, we could use some heuristic to find the “true” labels for each separate photo. For this idea, could you recommend me some papers/ways to do this?

Do you have any tips, tricks, ideas to try, papers to read?

Thank you.

submitted by /u/Icko_
[link] [comments]

[Project] Need some advice on my Final School Project(Sudoku solving with AI)

Hey I am a 18 year old student from Slovenia, just a few months before graduating.

To graduate and pass this last year I have to do a project(Wanted to do a game first but it was already taken) so the only thing I had left was a Artificial Inteligence project – Solving Sudoku with the help of AI.

I have a lot of questions and dont really know where to even start.Anyone knows any good sources I could learn from(About machine learning…)?

One of the main questions I have is if anyone knows a good C++ library for machine learning, and one for graphical programming(like “java Swing” for Java).

submitted by /u/UnlikelyDriver
[link] [comments]

[D] How models are actually used in practice?

I don’t have a lot of experience in industry and I would really like to hear from people with practical experience how things are done in practice. Not experimentation, but actual usage of models. Anything from classification to regression, I just want to hear from people who use these things from day to day.

Also, are there books which discuss case studies of models that made it into production? I’m looking to move on beyond the “regressing housing prices” examples into actual real-world examples of models. Maybe a book or an article which discusses these. Thanks!

submitted by /u/Minimum_Zucchini
[link] [comments]

[Discussion] Examples of mis-specifying optimization objectives causing unpleasant outcomes

In Stewart Russell’s book ‘Human Compatible’, he gave an example of social platform specifying maximization of click-through rate as objective, which did not only promote echo chamber effect, but in fact slowly modifying people’s preference so we become more predictable. In the process, driving more extreme viewpoints, because it is easier to predict what content will be clicked through when your view points are extreme to any one side of the spectrum.

I find this example complex and interesting, and am wondering what are other real-world examples?

submitted by /u/dbcrib
[link] [comments]

[R] Research Survey about Security in Machine Learning

Hey everybody,

we at the Fraunhofer AISEC are concerned with the awareness of security in machine learning implementations. Therefore, we are currently performing a survey with ML developers to capture the current state of the art.

If you are a developer working with ML and have ~15 minutes of free time, we kindly ask you to take part in our anonymous online survey:
https://websites.fraunhofer.de/ML_security/index.php/232539?lang=en

Our research is conducted in cooperation with the Freie Universität Berlin. For more information visit the following link:

https://www.mi.fu-berlin.de/inf/groups/ag-idm/projects/SecureMachineLearning/index.html

Thank you in advance!

submitted by /u/oliver133322
[link] [comments]