Category: Reddit MachineLearning

[P] – NLP and Optimisation project feedback, is this a good idea?

Written on November 5, 2019. Posted in Reddit MachineLearning.

Hey Guys, Looking for some feedback – Delete if not allowed.

We’re a data science company that runs competitions (like kaggle for finance). We’re about to run our first competition that requires ML approaches to do well. The competition is for students in the UK where the problems are an Optimisation (e.g. using reinforcement learning) and an NLP (e.g. by fine-tuning pre-trained neural networks)

The problems are sourced from top firms and are real problems that their data scientists and quants are working on. There are three problems covering different aspects of data science with a focus on the finance space.

Cash Prizes of up to $5000
AWS Credits up to $1000
10x Genuine Impact Investment Research Subscriptions

Does this sound like an interesting project or challenge? Would you as a student be interested?

submitted by /u/DaveatAuquan
[link] [comments]

[R] The Measure of Intelligence

Written on November 5, 2019. Posted in Reddit MachineLearning.

submitted by /u/Reiinakano
[link] [comments]

[D] Is Reinforcement Learning Practical?

Written on November 5, 2019. Posted in Reddit MachineLearning.

Is reinforcement learning practical at this point for industry work? The most prominent examples we see are from DeepMind (AlphaStar, AlphaGo), but the team are world-class researchers (over 40 of them) who also worked closely with expert Starcraft 2 players with a ton of computing resources.

As someone who hasn’t had much experience in RL, I see potential applications but am unsure of the amount of work or practicality of it. For example, one potential application for RL is to learn fraudulent behavior in an online retailer system (i.e. Amazon, EBay) and proactively find methods of fraud before they happen. One could imagine all the unintended behavior of misspecified reward function being useful for finding exploits in a system ( https://openai.com/blog/faulty-reward-functions/). But there are a lot of issues to overcome, (some mentioned in this article https://www.alexirpan.com/2018/02/14/rl-hard.html) about sample inefficiency, not to mention having to build your own simulator (and hope it’s representative to some degree).

What are people’s opinion on the practicality of using RL in something like fraud? Does it even make sense to build a simple online retailer simulator? I ask because it while I think RL is quite powerful, it feels it isn’t quite ready to be used. I would love to be shown to be wrong.

submitted by /u/edelweiss_ml
[link] [comments]

[R] [P] UNC BIAG Releases Mermaid, Pytorch based image registration toolkit

Written on November 5, 2019. Posted in Reddit MachineLearning.

We are thrilled to release our image registration toolkit after a long time! 🔥🔥

You can quickly prototype and test your image registration pipelines with Mermaid, based on PyTorch. 🌟

By using Mermaid, it is convenient to utilize GPU acceleration for registration models. PRs, questions are welcome! 🙌

We also have another package called easyreg, it wraps Mermaid and uses deep networks.

Github repository: https://github.com/uncbiag/mermaid

Documentation: https://mermaid.readthedocs.io/en/latest/index.html

Related papers:

Region-specific Diffeomorphic Metric Mapping https://arxiv.org/pdf/1906.00139.pdf https://github.com/uncbiag/easyreg

Zhengyang Shen, François-Xavier Vialard, Marc Niethammer. NeurIPS 2019.

Networks for Joint Affine and Non-parametric Image Registration https://arxiv.org/pdf/1903.08811.pdf https://github.com/uncbiag/easyreg

Zhengyang Shen, Xu Han, Zhenlin Xu, Marc Niethammer. CVPR 2019.

Metric Learning for Image Registration https://arxiv.org/pdf/1904.09524.pdf

Marc Niethammer, Roland Kwitt, Francois-Xavier Vialard. CVPR 2019.

Quicksilver: Fast predictive image registration–a deep learning approach https://arxiv.org/pdf/1703.10908.pdf https://github.com/rkwitt/quicksilver

Xiao Yang, Roland Kwitt, Martin Styner, Marc Niethammer, NeuroImage 2017.

Fast Predictive Image Registration https://github.com/rkwitt/FastPredictiveImageRegistration

Xiao Yang, Roland Kwitt, Marc Niethammer. DLMIA 2016.

submitted by /u/SahinOlut
[link] [comments]

[D] Given the recent news about plagiarism, will this be even more of a problem in the future?

Written on November 5, 2019. Posted in Reddit MachineLearning.

A couple examples:

https://www.reddit.com/r/MachineLearning/comments/dq82x7/discussion_a_questionable_sigir_2019_paper/

https://www.reddit.com/r/MachineLearning/comments/dh2xfs/d_siraj_has_a_new_paper_the_neural_qubit_its/

Both papers were easy to catch because they directly copied word for word large sections of text. But with more aggressive word substitution and NLP applications getting better, this would get much harder to detect in the future.

Are we going to see plagiarism on the rise in the near future?

submitted by /u/FirstTimeResearcher
[link] [comments]

[D] Random Forests and Decision Trees

Written on November 5, 2019. Posted in Reddit MachineLearning.

I am doing a binary classification problem where I currently run a decision tree across the data with 100 different random seeds, and then take the total number of outputs and figure out the final predicted classification. So if it comes out 1 75 times and 0 25 times, then the final prediction is a 1. I am using a pure majority problem (in the event of a tie, I go with 0). Would there be any benefit to running the exact same thing, but with 100 different random forests? In other words, will a decision tree and random forest predict the same wrong ones, but predict different correct ones? I am trying to find a way to push the accuracy a little higher. It works well, coming in with about 65% accuracy.

P.S. I do all the normal stuff like train-test split, limit the number of branches to the decision tree, etc.

P.P.S. I should note that the random seed changes for the train-test split and the decision tree when running the next tree.

submitted by /u/spot4992
[link] [comments]

[D] Parallelization for neuroevolution AutoML models

Written on November 5, 2019. Posted in Reddit MachineLearning.

I want to run multiple smaller models in parallel on the same GPU for the purposes of implementing something like CoDeepNEAT. However when, in testing, creating 100 small Torch CUDA models and getting the output of a 1000×8 tensor passed to each model with layer sizes 8-64-8, parallelizing with a pool of 8 workers takes ~15 seconds and uses ~6 GB of vRAM, and serially processing them takes ~0.03 seconds and uses ~100 MB of vRAM.

Is there some particular scheme that I should be using for this? Should I switch from Torch to Tensorflow? From Python to C++? Anyone have any ideas?

submitted by /u/MrAcurite
[link] [comments]

[D] How easy is it to get into Deepmind in 2019?

Written on November 5, 2019. Posted in Reddit MachineLearning.

More specifically:

What time does the office open and close in the morning?
Is it possible to get in during off hours if you have an access card?
Is the office open during the weekends?
Are the instructions for finding the entrance easy to follow?
How long is the walk from the metro station to the office?

submitted by /u/alexmlamb
[link] [comments]

[N] Spleeter released by Deezer for Source Separation

Written on November 5, 2019. Posted in Reddit MachineLearning.

Spleeter is an open-source project from Deezer for source separation on music tracks. Built with keras and tensorflow.

So basically this allows you to separate the vocal, drum, bass tracks and more from an mp3 file. They have provided a Google colab link so you can test their work without the need for installing anything.

Blog post: https://deezer.io/releasing-spleeter-deezer-r-d-source-separation-engine-2b88985e797e

Github: https://github.com/deezer/spleeter

Colab: https://colab.research.google.com/github/deezer/spleeter/blob/master/spleeter.ipynb

submitted by /u/Claree007
[link] [comments]