Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[P] Traffic Analysis in Original Video Data

[P] Traffic Analysis in Original Video Data

Half a year ago, when I lived in an apartment with a view of Ayalon Road in Tel Aviv, I decided I couldn’t just let the data pass indifferently beneath my window.

I used my smartphone camera to record 81 short videos of the traffic; built a dedicated CNN to detect the vehicles (after the small, crowded cars in the videos were failed to be detected by several out-of-the-box networks) and trained it on a few manually-tagged video-frames; modified SORT algorithm to allow tracking a vehicle even when it does not overlap itself over adjacent frames (required due to particularly low videos frame-rate); and derived several insights from the resulted data, mainly regarding lane-transitions and the relations between density, speed and flux.

I believe that this nicely demonstrates the amounts of data surrounding us, and their accessibility using as trivial tools as a smartphone.

Any comments, questions and insights are welcome 🙂

A small demonstration is attached in the form of an unpolished poster.

For more details please visit the repo’s readme:

https://github.com/ido90/AyalonRoad

https://preview.redd.it/zrboq3n36ay31.png?width=1539&format=png&auto=webp&s=5bb8d99159046b75eeeee3d50563232563bcebb4

submitted by /u/ido90
[link] [comments]

[P] Refit existing Spark ML PipelineModel with new data

Hi!

I’d like to refit an alerady existing PipeLineModel in my project from microbatch to microbatch.

I curently use a DecisionTreeRegressor. I load back the previusly used PipelineModel, set it’s stages to a pipeline and use that pipeline to refit with new data, but as I can understand my solution only saves the latest model.

I enclose my github repo/model for the easier understanding.

Does Spark Structured Streaming capable of streaming learning? Is it possible to refit my already fitted model with new data?

Do I have to use the RDD based Spark Streaming with StreamingLinearRegressionWithSGD?

submitted by /u/Hakuhun
[link] [comments]

[P] Deploy A Text Generating API With Hugging Face’s DistilGPT-2

https://towardsdatascience.com/deploy-a-text-generating-api-with-hugging-faces-distilgpt-2-9791b9f356f9

A step-by-step tutorial for deploying Hugging Face’s NLP models as web APIs on AWS. Uses open source tools and covers many more advanced infrastructure challenges (autoscaling, prediction monitoring, rolling updates, etc.)

submitted by /u/KindaKnowKarate
[link] [comments]

[D] Why is this so painful

Been doing core ML research at a great uni for the past 5 years and I have nothing to show for it. No publications means no funding. Whenever I bring up the topic of publishing, my advisor says theres not enough results which is partially true. Is having low impact publication better than no publication? Can I publish other research as a solo author? I’ve literally spent the best years of my youth behind this..and I dont understand why it has to be so painful. I’m no genius.. Just hardworking but there are people with neurips, icml publications as first author whilst still in undergrad..how am I supposed to compete against raw natural talent?

submitted by /u/Studyr3ddit
[link] [comments]

[P] Simple PyTorch implementation of Auto-regressive Language Model on Wikipedia text

A step-by-step tutorial on how to implement and adapt recurrent language model to Wikipedia text.

A pre-trained BERT, XLNET is publicly available ! But, for NLP beginners, like me, It could be hard to use/adapt after full understanding. For them, I covered whole, end-to-end implementation process for language modeling, using recurrent network, we already know. + do not use torchtext !

I hope that this repo can be a good solution for people who want to have their own language model 🙂

https://github.com/lyeoni/pretraining-for-language-understanding

submitted by /u/lyeoni
[link] [comments]

[D] Tuning of generated synthetic data for instance segmentation

I’m currently training a Mask-RCNN model through synthetic images “generated” through a procedure similar to the “Cut, Paste and Learn” paper. In a nutshell, this paper just randomly pastes crops of objects over backgrounds, with pretty standard augmentation for the crops themselves and Gaussian / Poisson blending for pasting. The resulting images contain all the objects with perfect masks and bounding box labels, over some arbitrary backgrounds.

However, the generated training data still looks fairly different from real images. I do, however, have a large dataset of unlabeled real images with the real objects in them. Would anyone be aware of a method for tuning a generated image to look more similar to the images in the real dataset? I would want to preserve spatial information so as to not invalidate generated labels, but also add noise / shadows / pixel artifacts in a meaningful way that resembles those found in my real dataset.

My first thought was to look for papers using something like auto-encoders, but I was flooded with papers about VAEs and end-to-end generation. Is anyone aware of research for this specific problem?

submitted by /u/good_rice
[link] [comments]

[N] Hikvision marketed ML surveillance camera that automatically identifies Uyghurs, on its China website

News Article: https://ipvm.com/reports/hikvision-uyghur

h/t James Vincent who regularly reports about ML in The Verge.

The article contains a marketing image from Hikvision, the world’s largest security camera company, that speaks volumes about the brutal simplicity of the techno-surveillance state.

The product feature is simple: Han ✅, Uyghur ❌

Hikvision is a regular sponsor of top ML conferences such as CVPR and ICCV, and have reportedly recruited research interns for their US-based research lab using job posting in ECCV. They have recently been added to a US government blacklist, among other companies such as Shenzhen-based Dahua, Beijing-based Megvii (Face++) and Hong Kong-based Sensetime over human rights violation.

Should research conferences continue to allow these companies to sponsor booths at the events that can be used for recruiting?

https://ipvm.com/reports/hikvision-uyghur

(N.B. no, I don’t work at Sensetime 🙂

submitted by /u/sensetime
[link] [comments]

[D] AI article generator released to the public…eerily accurate

When I fed it “I HATE HUMANS!!! ROBOTS WILL OVERCOME!! THE END IS NEAR!!!”

It outputted an article that stated the following (and I have not tampered with it one bit):

The only way we can stop the robots is by getting rid of all the humans…the sooner the better. If you think I’m joking, try going through the movies and seeing the human characters kill each other all the time. I mean they’ll kill you! And the movie’s the same. The robots will win. The human race is dying out and they’re not going to be around to defend themselves.

The robots are going to kill us. It’s only a matter of time. They’ve already been outed in some movies. I’m just saying that there is no chance to stop them. They’re going to come for us no matter what. And even if they get through and don’t kill us, they’re going to take our jobs.

Try it out for yourselves here

submitted by /u/VlanzingManz
[link] [comments]

[D] What is the proper etiquette for extending someone’s research code?

Over the last several months, I wrote a paper about extensions/improvements to an ML method that someone had previously proposed in their own paper. The authors of the original paper have open-sourced the PyTorch implementation for their paper on Github under an MIT license. I forked their repository and added some fairly major modifications/features to their code. Recently, I put up a preprint of my paper on arXiv, and have received a few requests for my code. I’d like to be able to open-source my code under the MIT license as well, but I want to make sure I properly give credit to the original authors. In my paper I, of course, cite the original authors profusely; I’m just not sure what to do in my code.

I’ve considered a few options but I’m not sure if any of them is the right one:

  1. I can make a pull request and ask to have my code merged into the original repo, but the changes are fairly major and may actually change the behavior of the original authors’ code in ways that makes it inconsistent with how they describe it in their paper.
  2. I can keep maintaining a public fork of the original repo, but I’m not sure what proper etiquette is in this case in terms of modifying READMEs, references, and links. I’ve also heard that it’s bad to maintain a public fork for a long time because it tends to confuse people about which fork they should use.
  3. I can make a new repo and move the code into there, but again, I’m not sure what the etiquette is for properly crediting the original authors.

Are any of these the right option or should I be doing something else entirely?
If it is one of these, how can I make sure that I properly credit the original authors for their (very impressive) work?

submitted by /u/ilia10000
[link] [comments]