Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[P] PyChubby – Face Warping

[P] PyChubby - Face Warping

What does it do?

Change facial expressions and shapes with zero effort.

How exactly?

Give it a photo, define what to do with the faces (smile, open eyes, shrink, …) , pychubby will do the rest.

Features

  • No need to manually specify landmarks
  • Works on any photos with arbitrary number of faces
  • Can be used for deep learning augmentations
  • … or just creating funny photos/videos

Links

github

blogpost

docs

I would appreciate any feedback. Especially regarding the possibility of using it for augmentations in face recognition related tasks. Also, if there is someone who is interested in contributing or suggesting new features I would be more than happy!

PyChubby in action

submitted by /u/kjanofficial
[link] [comments]

[D] Analyzing thousands of podcast transcripts – interesting project ideas & best algos?

Hi – we’re a small startup focusing on podcast transcriptions. We are working to make them readable, searchable, etc. We have previously used Tf-idf + LDA topic modeling to extract underlying topics in the corpus and compute related podcasts.

Potential ideas for future interesting projects include:

  • automatically identifying ads
  • picking up trends/sentiment on politics/people/companies, etc
  • find promo codes
  • auto-generate podcast summaries

What would you find most interesting, and why?

submitted by /u/pbirsinger
[link] [comments]

[P] Torchelie – Pytorch extended with more optimizers, losses, layers, models, transformations, and even training loops

Hi guys,

I’ve been working on Torchelie for something like two or three months now. Notable additions are :

  • bitempered loss
  • VQ layer from VQ-VAE
  • RAdam
  • SPADE / Conditional BatchNorm
  • Neural Style loss / Deep Dream loss / Feature viz loss
  • Spectral image to reproduce distill.pub’s amazing feature viz results

Anyway, here’s the link: https://github.com/Vermeille/Torchelie/ and the documentation is here: https://torchelie.readthedocs.io/

Criticism and bug reports are absolutely welcome.

Thank you for trying it out 🙂

submitted by /u/Vermeille
[link] [comments]

How does SOTA compare across languages in NLP? [Discussion]

There have obviously been amazing progress in deep-learning based NLP/speech-technology in recent years. As an english speaker, I was wondering how state-of-the-art varies across languages?

How are good are things like sentiment analysis, question answering, speech recognition in non-english languages?

Does anyone know of any failure modes in non-english European languages? and are there any good open source libraries for NLP in sub-continental languages like Hindi and Urdu?

thanks!

submitted by /u/Razcle
[link] [comments]

[D] Convergent evolution in next-generation neural networks?

In the research quest for next-generation neural networks, it seems like two “schools” of thinking

are converging to something somewhat similar even after approaching it from different angles.

I’m referring to the HTM/sparse coding/bio-inspired group and the “classical” deep learning group.

Historically, the principles espoused by the former group (e.g. Numenta, Ogma Neo, Sparsey) have not

gotten much traction in the ML community due to lack of experimental results.

However, more recently, approaches from within the deep learning community have begun to somewhat resemble the former’s approach(es). For example, OpenAI is now intensely interested in extremely sparse networks ( https://supercomputersfordl2017.github.io/Presentations/SmallWorldNetworkArchitectures.pdf ) and Geoff Hinton is working on capsules, which at least partially inspired by cortical columns ( https://numenta.com/blog/2017/12/18/comparing-capsules-with-htm/) and have some similarities.

Just as interestingly, it appears that some non-backpropagation local learning algorithms, such as Hebbian learning ( https://arxiv.org/abs/1908.08993 ) can actually scale to CIFAR and a small version of ImageNet. Of course, these results are very preliminary, but are at least somewhat interesting.

So I’d like to hear people’s thoughts on this. Maybe there might be an interesting convergent evolution phenomenon where the approaches to next-generation NNs end up being somewhat similar.

submitted by /u/darkconfidantislife
[link] [comments]

[D] Predicting value of feature/model engineering on very large data sets?

Setup: sizeable corpus (1TB+) of text data. The problem space roughly generalizes into a seq2seq model.

Given the size and the required parameter space to fully exploit the data, retraining the model against the full data set is (understandably) very expensive ($10ks / full training run).

Iterative pre-processing of the data is of course pricey as well.

Problem: How can we estimate the effect of some new feature/model-engineering without re-training on the entire data set?

Obviously we can do training runs at smaller amounts of data and/or with small parameter sizes. This is of course a good start; if something doesn’t work at smaller volumes of data, it usually isn’t helpful at larger volumes of data. But, at scale, enough data tends to wash away the value of many types of feature engineering and/or notionally more clever models.

Are there better ways to drive this process? Bonus points if backed up by research!

Our current pattern is something like:

  • Build multiple new things
  • Test at much smaller data volumes
  • Do an approximately full rebuild (although see below) every week or two, leveraging all new features/changes, and see if it moves the needle.

This gives us some results, but also means that if we have multiple new features/model changes that it is very hard to disambiguate the effect of new features at scale. (And heaven help us if the overall performance goes down, despite all of our upfront testing.)

To do ablations we’re instead left with doing ablations at much lower data volumes and using our best human intuition to decide what to apply at scale.

It all kind of works, but is…unsatisfying. And probably unoptimized.

Notes:

  • There are obviously lots of techniques to try to cram down the overall cost to re-training (e.g., warm starts from prior models or other pre-trained entities like Roberta). We are actively testing here; suggestions to decrease cost/wallclock are of welcome.

  • Do we “need” to use all the data and/or use larger params to eek out every last nth accuracy? To simplify the discussion here, let’s say yes–or at least assume that we’ve done a fairly good job of already doing that testing to figure out the right trade-off point between model maximization and accuracy required at the business level.

submitted by /u/farmingvillein
[link] [comments]

[P] torchdata: Implement map, cache, filter etc. within PyTorch’s Datasets (like Tensorflow’s tf.data and more)

Hi /r/MachineLearning,

What is torchdata

I would like to present you a new open source PyTorch based project (torchdata) which extends capabilities of torch.utils.data.Dataset by bringing map, cache and other operations known from tensorflow.data.Dataset (and actually a little more than that).

All that with a single line of code: super().__init__()

For more, check documentation or github repository.

Functionalities Overview

  • Use map, apply, reduce or filter
  • cache data in RAM or on disk (even partial caching, say first 20% RAM and the rest on disk)
  • Full PyTorch’s Dataset and IterableDataset support (including torchvision)
  • General torchdata.maps like Flatten or Select
  • Concrete torchdata.datasets designed for file reading and other general tasks

Example

  • Create image reading dataset

    import torchdata import torchvision class Images(torchdata.Dataset): # Different inheritance def __init__(self, path: str): super().__init__() # This is the only change self.files = [file for file in pathlib.Path(path).glob("*")] def __getitem__(self, index): return Image.open(self.files[index]) def __len__(self): return len(self.files) 
  • map each element to torch.Tensor and cache() everything in memory:

    images = Images("./data").map(torchvision.transforms.ToTensor()).cache() 
  • concatenate with labels (another torchdata.Dataset instance) and iterate over:

    for data, label in images | labels: # Do whatever you want with your data 

Installation

pip is the easiest of course:

 pip install torchdata 

You can also use nightly releases (torchdata-nightly) or GPU/CPU Docker based images (check documentation). Hopefully conda will be released soon as well, stay tuned

BTW. You can also checkout torchfunc, I plan to make a separate post about that in a week or so.

Thanks for checking the above, any input would be welcome (either here or on github)

submitted by /u/szymonmaszke
[link] [comments]

[D] Siraj Raval – Potentially exploiting students, banning students asking for refund. Thoughts?

[D] Siraj Raval - Potentially exploiting students, banning students asking for refund. Thoughts?

I’m not a personal follower of Siraj, but this issue came up in a ML FBook group that I’m part of. I’m curious to hear what you all think.

It appears that Siraj recently offered a course “Make Money with Machine Learning” with a registration fee but did not follow through with promises made in the initial offering of the course. On top of that, he created a refund and warranty page with information regarding the course after people already paid. Here is a link to a WayBackMachine capture of someone’s documentation of Siraj’s potential misdeeds: https://web.archive.org/save/https://case-for-a-refund.s3.us-east-2.amazonaws.com/feedback.html

According to Twitter threads, he has been banning anyone in his Discord/Slack that has been asking for refunds.

On top of this there are many Twitter threads regarding his behavior. A screenshot (bottom of post) of an account that has since been deactivated/deleted (assuming that the individual either agreed to shutdown their account for money, or were banned). Here is a Twitter WayBackMachine archive link of a search for the user in the screenshot: https://web.archive.org/web/20190921130513/https:/twitter.com/search?q=safayet96434935&src=typed_query. In the search results it is apparent that there are many students who have been impacted by Siraj.

UPDATE: Additional searching on Twitter has yielded many more posts, check out the tweets/retweets of these people: student1 student2

https://i.redd.it/sqkjkhjz3yn31.jpg

submitted by /u/nord2rocks
[link] [comments]