Category: Reddit MachineLearning

[Discussion] What tools and techniques do you use for neural network research? (can range from software to hardware stack)

Written on August 20, 2019. Posted in Reddit MachineLearning.

I will take two examples to illustrate the discussion.

So from the hardware accelerator design perspective
If you take a look at the BitFusion Paper : Github –> https://github.com/hsharma35/bitfusion
: ArXiv –> https://arxiv.org/pdf/1712.01507

If you look at the Quantized Neural networks example
Github — > https://github.com/MatthieuCourbariaux/BinaryNet
https://github.com/itayhubara/BinaryNet
ArXiv — > https://arxiv.org/pdf/1609.07061

So one can see from reading the papers and the code what tools were used for the experimentation. Here the first one uses some hardware modelling tools (like CACTI) to get preliminary results. In the case of the latter they use torch and theano and modify it to get the results (as far as I understand).

So are there any other suggestions that the Machine learning community would like to mention?

submitted by /u/elhiruko
[link] [comments]

[P] Robustness: a library for training and experimenting with standard and robust training

Written on August 20, 2019. Posted in Reddit MachineLearning.

Hi all,

Robustness is a library we made for use in our research that has evolved over several projects we have used it in. The result is a library for standard and robust (adversarial) training that is designed to be super extendible/customizable with very minimal effort. For example, the library allows us to train networks with custom loss functions, adversaries, logging, data loaders, etc, and also to perform a variety of input manipulation tasks using pretrained networks. Finally we provide a CLI interface for training standard and robust models.

Code is here: https://github.com/MadryLab/robustness

Full documentation, walkthroughs, and examples are here: https://robustness.readthedocs.io/

submitted by /u/andrew_ilyas
[link] [comments]

Data Annotation/Data Labelling [Research]

Written on August 20, 2019. Posted in Reddit MachineLearning.

I work for a company that provides Data Annotation/Data Labelling services(https://innodata.com/dataforai/). I am trying to understand the need and availability of Annotated Data.

Would like to know the importance/benefit of using Annotated Data on a scale of 1-10 (1 being the lowest)

submitted by /u/sahilshah9292
[link] [comments]

[D] 2019 NeurIPS ML Retrospectives

Written on August 20, 2019. Posted in Reddit MachineLearning.

Ryan Lowe talks about how he wants researchers to reflect on their past work.

https://thegradient.pub/introducing-retrospectives/

submitted by /u/hughbzhang
[link] [comments]

[Discussion] Scikit-Learn vs mlr for Machine Learning

Written on August 20, 2019. Posted in Reddit MachineLearning.

Just curious what most people use/prefer here? I assume that ‘python is eating the data science world‘ so i would lean towards Scikit Learn. Could I be wrong? Does it depend on the user? Does it even matter?

“Scikit-Learn is known for its easily understandable API and for Python users and MLR became and alternative to the popular Caret package with more a large suite of algorithms available and an easy way of tuning hyperparameters. These two packages are somewhat in competition due to the debate where many people involved in analytics turn to Python for machine learning and R for statistical analysis.

One of the reasons for a preference to Python could be because that current R packages for machine learning are provided via other packages that contain the algorithm. The packages are called through MLR but still requires extra installation. Even external feature selection libraries are needed and they will have other external dependencies that need to be satisfied as well.”

– source https://blog.exxactcorp.com/scikitlearn-vs-mlr-for-machine-learning/

submitted by /u/exxact-jm
[link] [comments]

[R] Acting without Rewards

Written on August 20, 2019. Posted in Reddit MachineLearning.

Hello,

Here is our latest blog post. It is an “aside” from our regular demos – we have two new ones in the works, but we thought it would be interesting to share some research we did in the meantime.

Link the the post: https://ogma.ai/2019/08/acting-without-rewards/

The post talks about unsupervised behavior learning (UBL), a method for having an agent learn from every interaction with its environment. This method is similar in purpose to hindsight experience replay (HER), but functions very differently and offers different advantages.

Let us know what you think!

submitted by /u/CireNeikual
[link] [comments]

Success rate/scoring of categorical features as features? [Discussion]

Written on August 20, 2019. Posted in Reddit MachineLearning.

Hi all,

Let’s say I have a dataset with a mix of continuous and categorical variables and I’m creating an imbalanced binary classification model. A certain categorical variable has many (1000’s) of non-ordinal values. This feature is very important, certain values have high success rates ( num success(num rows with value and flag of 1)) / num instances (all rows with value) ). I have created features that include cat_var_num_success, cat_var_ success_rate, and a feature that is a score of each value of the categorical feature. The score assigns the mean overall success rate as the score if the value low sampling, if the value has a sufficient number of observations, and the success rate is greater than the mean overall success rate, the score is raised, the score is lowered if the value performs worse than the overall mean.

These generated features have proven to be highly predictive and improve the model (xgb). My concern comes from the fact that I have calculated these values using the whole dataset, which I subsequently split into train-test. I am afraid that the performance of the model is increasing due to information testing information leaking into training via the generated features.

Should I create an additional holdout set which does not contribute to the calculations for feature generation and test on that?

Thoughts?

Any feedback is appreciated!

submitted by /u/JohnnyCaggz
[link] [comments]

[P] Text classification w/ pytorch-transformers using RoBERTa

Written on August 20, 2019. Posted in Reddit MachineLearning.

Hi I just published a blog post on how to train a text classifier using pytorch-transformers using the latest RoBERTa model. Colab notebook is available: https://rsilveira79.github.io/fermenting_gradients/machine_learning/nlp/pytorch/text_classification_roberta/

submitted by /u/rsilveira79
[link] [comments]

[D] The state of transfer learning in NLP

Written on August 20, 2019. Posted in Reddit MachineLearning.

http://ruder.io/state-of-transfer-learning-in-nlp/

This blog post by Sebastian Ruder is a quick review of how natural language processing has benefited from transfer learning. He ties together how recent advances (e.g., pretrained models/BERT, optimization schemes, multitask fine-tuning, etc) can work together to improve language modeling, and also poses some open problems in the field. See also the (somewhat empty) HN discussion.

submitted by /u/jwuphysics
[link] [comments]

[Research] Help us record data for research on NLP from Radio Data for Agriculture

Written on August 20, 2019. Posted in Reddit MachineLearning.

Hello lovely inhabitants of r/MachineLearning. I am doing a residency at Artificial Intelligence and Data Science Research Lab, Makerere University, Uganda. We are collecting speech data for use in analysing radio programs related to agriculture, we will use this data to help in mapping and understanding the spread of crop disease.

Help us out by taking some time and recording some words.

Go here to help us record or there to view our work.

submitted by /u/ghost_shaba7
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[Discussion] What tools and techniques do you use for neural network research? (can range from software to hardware stack)

[P] Robustness: a library for training and experimenting with standard and robust training

Data Annotation/Data Labelling [Research]

[D] 2019 NeurIPS ML Retrospectives

[Discussion] Scikit-Learn vs mlr for Machine Learning

[R] Acting without Rewards

Success rate/scoring of categorical features as features? [Discussion]

[P] Text classification w/ pytorch-transformers using RoBERTa

[D] The state of transfer learning in NLP

[Research] Help us record data for research on NLP from Radio Data for Agriculture