Category: Reddit MachineLearning

[P] These Lyrics Do Not Exist

Written on August 9, 2019. Posted in Reddit MachineLearning.

I have trained a songwriter Artificial Intelligence that creates you completely original song lyrics!

You just provide a song topic word, then press “Generate Lyrics” for completely new song lyrics

Please let me know if you have any ideas for improvements

Link: https://theselyricsdonotexist.com/ 🤖🎤🎶

submitted by /u/itsmybirthday19
[link] [comments]

[D] Should tokens with a very small frequency be removed from the vocabulary before training a word2vec type model?

Written on August 8, 2019. Posted in Reddit MachineLearning.

I believe they should as there is only one/few contexts they are used in so it wouldn’t be possible to learn a good representation for the token.

submitted by /u/searchingundergrad
[link] [comments]

[D] Facebook AI interview process

Written on August 8, 2019. Posted in Reddit MachineLearning.

Hi /r/MachineLearning,

I am interviewing onsite with Facebook AI later this month, as a researcher in one of their applied Machine Learning team, and was wondering if anyone here has gone through their process. What kind of questions did they ask, how where the interviews conducted? How did you find the experience?

I have taken several interviews before focused on coding, so I am familiar with those, but I was told at Facebook they have “domain” and “out-of-domain” interview rounds where I will be asked questions by ML experts, on things both in and outside of my area of research expertise, so I am a bit scared of what to expect as I never went through something like this before.

Would be super grateful to any advice or insights into the process!

submitted by /u/fb_ai_7262
[link] [comments]

[D] How to convert a pretrained TensorFlow model to PyTorch – a simple workflow and a few lessons learned

Written on August 8, 2019. Posted in Reddit MachineLearning.

A simple guide by HuggingFace on how to convert a pretrained TensorFlow model in PyTorch easily and reliably.

The blog post summarizes the workflow they are using to make fast and accurate TensorFlow to PyTorch conversions and share some lessons learned from reimplementing a bunch of TensorFlow models in the pytorch-transformers open-source library.

Here is the blog post: https://medium.com/huggingface/from-tensorflow-to-pytorch-265f40ef2a28

submitted by /u/Thomjazz
[link] [comments]

[P] entity resolution system for large-scale databases

Written on August 8, 2019. Posted in Reddit MachineLearning.

Hello everyone,

I’d like to share some insights about a Wikimedia Foundation project I’ve been contributing to.

soweego is an entity resolution system that links the Wikidata knowledge base to large external databases through a set of supervised algorithms: https://soweego.readthedocs.io/

Specifically, we leveraged Bernoulli Naïve Bayes, Linear Support Vector Machines, Single-layer Perceptrons, and Multi-layer Perceptrons. As an interesting finding, models based on Single-layer Perceptrons are the ones that work best for our input datasets, namely Discogs, IMDb, and MusicBrainz.

soweego partners with Mix’n’match, which mainly deals with small catalogs. soweego is currently uploading 255 k confident identifiers to Wikidata, see its activity. 126 k medium-confident links are instead getting into Mix’n’match for curation.

The soweego team has also worked hard to address the following community requests:

sync Wikidata to external databases and check them to spot inconsistencies in Wikidata;
import new databases with reasonable effort.

If you like the project, please consider starring it on GitHub: https://github.com/Wikidata/soweego

submitted by /u/tupini07
[link] [comments]

[P] Simple PyTorch implementation of Language Model

Written on August 8, 2019. Posted in Reddit MachineLearning.

A step-by-step tutorial on how to implement and adapt simple language model to Wikipedia text.

A pre-trained BERT, XLNET is publicly available ! But, for NLP beginners, like me, It could be hard to use/adapt after full understanding. For them, I covered whole, end-to-end implementation process for language modeling, using recurrent network, we already know. + do not use torchtext !

I hope that this repo can be a good solution for people who want to have their own language model 🙂

https://github.com/lyeoni/pretraining-for-language-understanding

submitted by /u/lyeoni
[link] [comments]

[N] Hi r/ML! You probably know about our TRAINS platform, but do you know why we open-sourced it?

Written on August 8, 2019. Posted in Reddit MachineLearning.

“TRAINS: An open-source, zero-integration tool to boost machine learning research”
https://heartbeat.fritz.ai/trains-all-aboard-ba92a728eb6d

In this piece, specifically the second part (Platform 1), I tried to convey concisely why we made our platform open source, which is something I felt left to hand-waving in my previous posts here.

I would love to hear from r/MachineLearning if that particular message comes through!

… and as usual if anything does not “magically” works for you 😉

Context: First and Second posts here.

The rest of the piece is also recommended if you don’t know what TRAINS is or do not believe it can boost your ml research. Enjoy!

PS. Shout out to fritz.ai for hosting us on heartbeat!

PS/2 Mods, this is probably more [N] than [D] or [P], but I can accept it if you change the flair!

submitted by /u/LSTMeow
[link] [comments]

[D] Multi-style disentanglement and Unsupervised aesthetics prediction of music by predicting future and analysing past

Written on August 8, 2019. Posted in Reddit MachineLearning.

The problem with aesthetics prediction is that it’s learned on datasets provided by some of the users which might not reflect the diversity of aesthetics perception of different people and have poor generalization ability..

I’ve been thinking about usage of Content & Style disentanglement for learning several styles (and also relations between them), and then feed on mini-supervision given by a human by selecting personally most beautiful images, which would make the algorithm look for the most similiar styles..

However, to make the model exposed to as many styles as possible, the model shall have intrinsic motivation (curiosity) to explore those which it struggle more to disentangle than those which it already disentangled (almost) successfully, and then learn to combine them, followed by a model to disentangle several styles & single content..

The next topic is music creation, today’s best-performing models apparently learn on several musical genres, and then synthetize a new sample by starting out-of-scratch and then predicting the next note until it reaches several minutes..

To make the music piece more tense, i believe it might require these steps to be in place..

1.Learn one model to predict the future arrangement of a song at any given time, and then make the generator minimize the certainty of this model

2.Learn another model to analyze the past of the song, and maximize the recognition rate (certainty) of this one by the generator as well..

(These two models may share their knowledge, as the task is done on the same musical pieces..)

So, both of these models are learnt on recognizing genres & learn on their patterns, except the first one focus on future (which has not been heard yet by the model), and the other one on past (which has been already heard)

Obviously, the aesthetics score would be produced by the models rewards based on analyzing/predicting on that specific song (as specified in the two steps above)

submitted by /u/ad48hp
[link] [comments]

Regarding beginner’s guides

Written on August 8, 2019. Posted in Reddit MachineLearning.

Hi all,

/r/machinelearning is growing rampantly, with over a thousand new subscribers every day. As our community grows, it is important to have fertile ground for newcomers to learn the ropes. Since there is already an active subreddit for aiding in the development of machine learning skills, we feel that this is the right time to demarcate the content between these two subs.

As a new rule, all beginner-level content should be posted to our sister sub, /r/learnmachinelearning. This will free up “real estate” on our page for more in-depth, expert discussions and provide a more focused learning space for beginners. That’s not to say that all tutorials are outright banned — in particular, explanations of recent or niche papers are still welcome.

We were all beginners once and newcomers to ML are bringing great things to this sub and the general community. Please do continue to engage with and learn from the community here. But we recommend /r/learnmachinelearning if you do want to start getting your hands dirty.

We hope that this specialization will be beneficial to everyone in the long run.

Best regards, the moderator team

submitted by /u/MTGTraner
[link] [comments]

[D] Submitting code while preserving anonymity

Written on August 8, 2019. Posted in Reddit MachineLearning.

I received some negative feedback on a recent paper submission stating that my results would not be reproducible. However, I took care to only use easily accessible, public benchmark data and wrote code to make it easy for anyone to reproduce my results. I did remove the GitHub link to the code in my submission to preserve anonymity (stating in the submission that this was the reason for not providing a functioning link). What is the best way to handle this in the future without compromising double or triple blind reviewing? Especially if the code should remain private until acceptance.

submitted by /u/instantlybanned
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[P] These Lyrics Do Not Exist

[D] Should tokens with a very small frequency be removed from the vocabulary before training a word2vec type model?

[D] Facebook AI interview process

[D] How to convert a pretrained TensorFlow model to PyTorch – a simple workflow and a few lessons learned

[P] entity resolution system for large-scale databases

[P] Simple PyTorch implementation of Language Model

[N] Hi r/ML! You probably know about our TRAINS platform, but do you know why we open-sourced it?

[D] Multi-style disentanglement and Unsupervised aesthetics prediction of music by predicting future and analysing past

Regarding beginner’s guides

[D] Submitting code while preserving anonymity