Category: Reddit MachineLearning

[D] Are there current AI that have a somewhat higher degree of autonomy that the programmers are unable to control everything that it does?

Written on September 21, 2019. Posted in Reddit MachineLearning.

As the title states, I just want to know what level of autonomy the most advanced AI have. I’m trying to comprehend if we’re at a point where an AI can do an act which the programmer might not be able to prevent from happening?

submitted by /u/SirHovaOfBrooklyn
[link] [comments]

[P] PyChubby – Face Warping

Written on September 21, 2019. Posted in Reddit MachineLearning.

What does it do?

Change facial expressions and shapes with zero effort.

How exactly?

Give it a photo, define what to do with the faces (smile, open eyes, shrink, …) , pychubby will do the rest.

Features

No need to manually specify landmarks
Works on any photos with arbitrary number of faces
Can be used for deep learning augmentations
… or just creating funny photos/videos

Links

github

blogpost

docs

I would appreciate any feedback. Especially regarding the possibility of using it for augmentations in face recognition related tasks. Also, if there is someone who is interested in contributing or suggesting new features I would be more than happy!

PyChubby in action

submitted by /u/kjanofficial
[link] [comments]

[D] Analyzing thousands of podcast transcripts – interesting project ideas & best algos?

Written on September 20, 2019. Posted in Reddit MachineLearning.

Hi – we’re a small startup focusing on podcast transcriptions. We are working to make them readable, searchable, etc. We have previously used Tf-idf + LDA topic modeling to extract underlying topics in the corpus and compute related podcasts.

Potential ideas for future interesting projects include:

automatically identifying ads
picking up trends/sentiment on politics/people/companies, etc
find promo codes
auto-generate podcast summaries

What would you find most interesting, and why?

submitted by /u/pbirsinger
[link] [comments]

[P] Torchelie – Pytorch extended with more optimizers, losses, layers, models, transformations, and even training loops

Written on September 20, 2019. Posted in Reddit MachineLearning.

Hi guys,

I’ve been working on Torchelie for something like two or three months now. Notable additions are :

bitempered loss
VQ layer from VQ-VAE
RAdam
SPADE / Conditional BatchNorm
Neural Style loss / Deep Dream loss / Feature viz loss
Spectral image to reproduce distill.pub’s amazing feature viz results

Anyway, here’s the link: https://github.com/Vermeille/Torchelie/ and the documentation is here: https://torchelie.readthedocs.io/

Criticism and bug reports are absolutely welcome.

Thank you for trying it out 🙂

submitted by /u/Vermeille
[link] [comments]

[D] How important was your Masters for the industry skills?

Written on September 20, 2019. Posted in Reddit MachineLearning.

I am thinking wether to enroll in one-year or two-year master’s programme. I’ve asked people I personally know how much their classes mattered for their job and most of them said only a small portion of the classes did (1-3 classes). How much did it matter for you?

Edit: I am about to finish 4 year (240 ECTS) undergrad.

submitted by /u/starzmustdie
[link] [comments]

How does SOTA compare across languages in NLP? [Discussion]

Written on September 20, 2019. Posted in Reddit MachineLearning.

There have obviously been amazing progress in deep-learning based NLP/speech-technology in recent years. As an english speaker, I was wondering how state-of-the-art varies across languages?

How are good are things like sentiment analysis, question answering, speech recognition in non-english languages?

Does anyone know of any failure modes in non-english European languages? and are there any good open source libraries for NLP in sub-continental languages like Hindi and Urdu?

thanks!

submitted by /u/Razcle
[link] [comments]

[D] Convergent evolution in next-generation neural networks?

Written on September 20, 2019. Posted in Reddit MachineLearning.

In the research quest for next-generation neural networks, it seems like two “schools” of thinking

are converging to something somewhat similar even after approaching it from different angles.

I’m referring to the HTM/sparse coding/bio-inspired group and the “classical” deep learning group.

Historically, the principles espoused by the former group (e.g. Numenta, Ogma Neo, Sparsey) have not

gotten much traction in the ML community due to lack of experimental results.

However, more recently, approaches from within the deep learning community have begun to somewhat resemble the former’s approach(es). For example, OpenAI is now intensely interested in extremely sparse networks ( https://supercomputersfordl2017.github.io/Presentations/SmallWorldNetworkArchitectures.pdf ) and Geoff Hinton is working on capsules, which at least partially inspired by cortical columns ( https://numenta.com/blog/2017/12/18/comparing-capsules-with-htm/) and have some similarities.

Just as interestingly, it appears that some non-backpropagation local learning algorithms, such as Hebbian learning ( https://arxiv.org/abs/1908.08993 ) can actually scale to CIFAR and a small version of ImageNet. Of course, these results are very preliminary, but are at least somewhat interesting.

So I’d like to hear people’s thoughts on this. Maybe there might be an interesting convergent evolution phenomenon where the approaches to next-generation NNs end up being somewhat similar.

submitted by /u/darkconfidantislife
[link] [comments]

[D] Predicting value of feature/model engineering on very large data sets?

Written on September 20, 2019. Posted in Reddit MachineLearning.

Setup: sizeable corpus (1TB+) of text data. The problem space roughly generalizes into a seq2seq model.

Given the size and the required parameter space to fully exploit the data, retraining the model against the full data set is (understandably) very expensive ($10ks / full training run).

Iterative pre-processing of the data is of course pricey as well.

Problem: How can we estimate the effect of some new feature/model-engineering without re-training on the entire data set?

Obviously we can do training runs at smaller amounts of data and/or with small parameter sizes. This is of course a good start; if something doesn’t work at smaller volumes of data, it usually isn’t helpful at larger volumes of data. But, at scale, enough data tends to wash away the value of many types of feature engineering and/or notionally more clever models.

Are there better ways to drive this process? Bonus points if backed up by research!

Our current pattern is something like:

Build multiple new things
Test at much smaller data volumes
Do an approximately full rebuild (although see below) every week or two, leveraging all new features/changes, and see if it moves the needle.

This gives us some results, but also means that if we have multiple new features/model changes that it is very hard to disambiguate the effect of new features at scale. (And heaven help us if the overall performance goes down, despite all of our upfront testing.)

To do ablations we’re instead left with doing ablations at much lower data volumes and using our best human intuition to decide what to apply at scale.

It all kind of works, but is…unsatisfying. And probably unoptimized.

Notes:

There are obviously lots of techniques to try to cram down the overall cost to re-training (e.g., warm starts from prior models or other pre-trained entities like Roberta). We are actively testing here; suggestions to decrease cost/wallclock are of welcome.
Do we “need” to use all the data and/or use larger params to eek out every last nth accuracy? To simplify the discussion here, let’s say yes–or at least assume that we’ve done a fairly good job of already doing that testing to figure out the right trade-off point between model maximization and accuracy required at the business level.

submitted by /u/farmingvillein
[link] [comments]

[P] torchdata: Implement map, cache, filter etc. within PyTorch’s Datasets (like Tensorflow’s tf.data and more)

Written on September 20, 2019. Posted in Reddit MachineLearning.

Hi /r/MachineLearning,

What is torchdata

I would like to present you a new open source PyTorch based project (torchdata) which extends capabilities of torch.utils.data.Dataset by bringing map, cache and other operations known from tensorflow.data.Dataset (and actually a little more than that).

All that with a single line of code: `super().init()`

For more, check documentation or github repository.

Functionalities Overview

Use map, apply, reduce or filter
cache data in RAM or on disk (even partial caching, say first 20% RAM and the rest on disk)
Full PyTorch’s Dataset and IterableDataset support (including torchvision)
General torchdata.maps like Flatten or Select
Concrete torchdata.datasets designed for file reading and other general tasks

Example

Create image reading dataset

import torchdata import torchvision class Images(torchdata.Dataset): # Different inheritance def __init__(self, path: str): super().__init__() # This is the only change self.files = [file for file in pathlib.Path(path).glob("*")] def __getitem__(self, index): return Image.open(self.files[index]) def __len__(self): return len(self.files)

map each element to torch.Tensor and cache() everything in memory:

images = Images("./data").map(torchvision.transforms.ToTensor()).cache()

concatenate with labels (another torchdata.Dataset instance) and iterate over:
```
for data, label in images | labels: # Do whatever you want with your data 
```

Installation

pip is the easiest of course:

 pip install torchdata

You can also use nightly releases (torchdata-nightly) or GPU/CPU Docker based images (check documentation). Hopefully conda will be released soon as well, stay tuned

BTW. You can also checkout torchfunc, I plan to make a separate post about that in a week or so.

Thanks for checking the above, any input would be welcome (either here or on github)

submitted by /u/szymonmaszke
[link] [comments]

[D] Siraj Raval – Potentially exploiting students, banning students asking for refund. Thoughts?

Written on September 20, 2019. Posted in Reddit MachineLearning.

I’m not a personal follower of Siraj, but this issue came up in a ML FBook group that I’m part of. I’m curious to hear what you all think.

It appears that Siraj recently offered a course “Make Money with Machine Learning” with a registration fee but did not follow through with promises made in the initial offering of the course. On top of that, he created a refund and warranty page with information regarding the course after people already paid. Here is a link to a WayBackMachine capture of someone’s documentation of Siraj’s potential misdeeds: https://web.archive.org/save/https://case-for-a-refund.s3.us-east-2.amazonaws.com/feedback.html

According to Twitter threads, he has been banning anyone in his Discord/Slack that has been asking for refunds.

On top of this there are many Twitter threads regarding his behavior. A screenshot (bottom of post) of an account that has since been deactivated/deleted (assuming that the individual either agreed to shutdown their account for money, or were banned). Here is a Twitter WayBackMachine archive link of a search for the user in the screenshot: https://web.archive.org/web/20190921130513/https:/twitter.com/search?q=safayet96434935&src=typed_query. In the search results it is apparent that there are many students who have been impacted by Siraj.

UPDATE: Additional searching on Twitter has yielded many more posts, check out the tweets/retweets of these people: student1 student2

https://i.redd.it/sqkjkhjz3yn31.jpg

submitted by /u/nord2rocks
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[D] Are there current AI that have a somewhat higher degree of autonomy that the programmers are unable to control everything that it does?

[P] PyChubby – Face Warping

[D] Analyzing thousands of podcast transcripts – interesting project ideas & best algos?

[P] Torchelie – Pytorch extended with more optimizers, losses, layers, models, transformations, and even training loops

[D] How important was your Masters for the industry skills?

How does SOTA compare across languages in NLP? [Discussion]

[D] Convergent evolution in next-generation neural networks?

[D] Predicting value of feature/model engineering on very large data sets?

[P] torchdata: Implement map, cache, filter etc. within PyTorch’s Datasets (like Tensorflow’s tf.data and more)

What is torchdata

All that with a single line of code: `super().init()`

Functionalities Overview

Example

Installation

[D] Siraj Raval – Potentially exploiting students, banning students asking for refund. Thoughts?

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

What is torchdata

All that with a single line of code: super().__init__()

Functionalities Overview

Example

Installation

All that with a single line of code: `super().init()`