Category: Reddit MachineLearning

[D] BatchNorm alternatives 2019

Written on August 28, 2019. Posted in Reddit MachineLearning.

The main reason why people BatchNorm despite being compute heavy (~25% of total model) is because of fast official cudnn implementations. Same reason why RNNs other than LSTM and GRU never went popular.

Also, BatchNorm requires computing square root and division which require full precision to work properly. Going half-precision or applying quantization is not easy.

Anyway, are there any new methods that can dethrone BatchNorm entirely? Some papers:

Equinormalization https://openreview.net/forum?id=r1gEqiC9FX

Generalized Hamming Network https://arxiv.org/abs/1710.10328

submitted by /u/tsauri
[link] [comments]

[Research] Conditional LSTM-GAN for Melody Generation from Lyrics

Written on August 28, 2019. Posted in Reddit MachineLearning.

submitted by /u/enverx
[link] [comments]

[D] Are conferences interested in papers that introduce new datasets?

Written on August 28, 2019. Posted in Reddit MachineLearning.

I’ve been working with a new dataset and running standard models on it. Would any conferences be interested in a paper introducing this NLP dataset detailing what’s special about it, results of current sota methods?

A bit on this data: each example has text, code, and a place in a graph so it acts as a task where there are methods for each of these types of data but few for all combined. Could be interesting for someone working with NLP or GNNs. Essentially, there are a lot of complex relationships within this data that I haven’t seen other datasets match.

submitted by /u/searchingundergrad
[link] [comments]

[D] Specific tips on Machine Learning research in a PhD

Written on August 28, 2019. Posted in Reddit MachineLearning.

I am a new Machine Learning PhD and my topic is roughly vision, i.e. semantic/instance segmentation, and to be honest I am a little lost.

How exactly, specifically do you conduct research in this field? How does the day to day work look like?

Do you think of new NN architectures and test them experimentally?
Do you download others models and just try them out with own datasets?
How do you keep track on different architectures, papers, etc. Maybe make an excel document with all the papers you’ve read with a short summary?

I would be really interested in how the day to day work of other researchers in the field looks like and what specific tips you might have.

submitted by /u/schrowawey
[link] [comments]

[D] Vehicle damage inspection using AI?

Written on August 28, 2019. Posted in Reddit MachineLearning.

What are the most impactful startups working on this? And any papers?

Thanks!

submitted by /u/cbsudux
[link] [comments]

[D] Inter-annotator agreement: how does it work for computer vision?

Written on August 28, 2019. Posted in Reddit MachineLearning.

We have a dataset which we need to annotate: the task is object detection, thus we need to create bounding boxes. We’re going to use

https://github.com/wkentaro/labelme

But I’mm open to alternative suggestions, if you think there are better tools. Since the dataset is very large and very confidential, we’re going to annotate it in-house. I’ve heard of people trying to estimate the error due to subjectivity/mistakes in human annotation, but I don’t quite understand how it works. Let’s suppose for the sake of example that I have 900 images and 3 annotators. If I understand correctly, rather than partitioning the dataset in three subsets of size 300 and sending each subset to a different annotator, I divide it in three datasets of size, say, 330, which means that some images will necessarily be annotated by multiple users.

I don’t understand how to use these multiple annotations in practice, though: when I prepare my dataset, for each image which has been annotated by multiple users I’ll have to choose which annotations to use. It’s not like I can have three different bounding boxes (three different ground truths) for each object in the image. So, how does it work in practice?

submitted by /u/arkady_red
[link] [comments]

[D] Research shows SGD with too large of a mini batch can lead to huge overfitting in deep learning. Why doesn’t batch gradient descent have this problem?

Written on August 28, 2019. Posted in Reddit MachineLearning.

Here is an example paper showing test score getting very bad as batch size gets too large: https://arxiv.org/pdf/1804.07612.pdf

Batch gradient descent runs over the whole dataset. Does it have the same problem? If not, why?

submitted by /u/DstnB3
[link] [comments]

[P] Tensorflow implementation of RAdam optimizer (On the Variance of the Adaptive Learning Rate and Beyond)

Written on August 28, 2019. Posted in Reddit MachineLearning.

result

submitted by /u/taki0112
[link] [comments]

[N] Deep Graph Library new release (v0.3.1)

Written on August 27, 2019. Posted in Reddit MachineLearning.

Though only a minor release, this new release includes a bunch of very useful Graph Neural Network modules and model examples that can be directly used in your project. Here is a list of new modules:

New NN Modules

GATConv from “Graph Attention Network”
RelGraphConv from “Modeling Relational Data with Graph Convolutional Networks”
TAGConv from “Topology Adaptive Graph Convolutional Networks”
EdgeConv from “Dynamic Graph CNN for Learning on Point Clouds”
SAGEConv from “Inductive Representation Learning on Large Graphs”
GatedGraphConv from “Gated Graph Sequence Neural Networks”
GMMConv from “Geometric Deep Learning on Graphs and Manifolds using Mixture Model CNNs”
GINConv from “How Powerful are Graph Neural Networks?”
ChebConv from “Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering”
SGConv from “Simplifying Graph Convolutional Networks”
NNConv from “Neural Message Passing for Quantum Chemistry”
APPNPConv from “Predict then Propagate: Graph Neural Networks meet Personalized PageRank”
AGNNConv from “Attention-based Graph Neural Network for Semi-Supervised Learning”
DenseGraphConv (Dense implementation of GraphConv)
DenseSAGEConv (Dense implementation of SAGEConv)
DenseChebConv (Dense implementation of ChebConv)

New global pooling module

Sum/Avg/MaxPooling
SortPooling
GlobalAttentionPooling from GGNN model
Set2Set from “Order Matters: Sequence to sequence for sets”
SetTransformerEncoder and SetTransformerDecoder from “Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks”

New graph transformation routines

dgl.transform.khop_adj
dgl.transform.khop_graph
dgl.transform.laplacian_lambda_max
dgl.transform.knn_graph
dgl.transform.segmented_knn_graph

This DGL release also includes a model zoo for chemistry applications such as using GNNs to predict molecular property or generate new molecule structures that is valuable for drug discovery. Pre-trained models are also available for download in simply two lines of codes:

“`python from dgl.data import Tox21 from dgl import model_zoo

dataset = Tox21() model = model_zoo.chem.load_pretrained(‘GCN_Tox21’) # Pretrained model loaded model.eval()

smiles, g, label, mask = dataset[0] feats = g.ndata.pop(‘h’) label_pred = model(g, feats) print(smiles) # CCOc1ccc2nc(S(N)(=O)=O)sc2c1 print(label_pred[:, mask != 0]) # Mask non-existing labels

tensor([[-0.7956, 0.4054, 0.4288, -0.5565, -0.0911,

0.9981, -0.1663, 0.2311, -0.2376, 0.9196]])

“`

Check it out if you are using GNNs, working with molecules or just interested in this whole new field.

See full release note here: https://www.dgl.ai/release/2019/08/28/release.html.

submitted by /u/jermainewang
[link] [comments]

[D] Is “Wasserstein metric” the right name to use?

Written on August 27, 2019. Posted in Reddit MachineLearning.

According to wiki: ” The name “Wasserstein distance” was coined by R. L. Dobrushin in 1970, after the Russian mathematician Leonid Vaseršteĭn who introduced the concept in 1969. “. And indeed, I found the paper written in Russian by Dobrushin, which mentioned in reference:” Л..Н. Васерштейн, Марковские процессы на счетном произведении пространств, описывающие большие системы автоматов. Пробл. перед, информ. 5, 3 (1969), 64—73. “, and Leonid Vaseršteĭn is just english for Леонид Васерштейн.

Although I could not read Russian and I could not find the content of the original papr by Leonid Vaseršteĭn, the wiki still seems convincing.

However, it seems Fréchet distance is identical to 2-Wasserstein distance, and Fréchet distance was introduced in 1957, according to the original French paper “Sur la distance de deux lois de probabilité.”

Does it means Fréchet discovered it first and wiki is wrong about the origin? What’s more, should we call it Fréchet distance instead of Wasserstein distance?

P.S.

If you search “Fréchet distance” on google, what comes out is not a distance for distribution but distance for path. I am confused by the relationship between “Fréchet distance of path” with “Fréchet distance of distribution”.

submitted by /u/746645147
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[D] BatchNorm alternatives 2019

[Research] Conditional LSTM-GAN for Melody Generation from Lyrics

[D] Are conferences interested in papers that introduce new datasets?

[D] Specific tips on Machine Learning research in a PhD

[D] Vehicle damage inspection using AI?

[D] Inter-annotator agreement: how does it work for computer vision?

[D] Research shows SGD with too large of a mini batch can lead to huge overfitting in deep learning. Why doesn’t batch gradient descent have this problem?

[P] Tensorflow implementation of RAdam optimizer (On the Variance of the Adaptive Learning Rate and Beyond)

[N] Deep Graph Library new release (v0.3.1)

New NN Modules

New global pooling module

New graph transformation routines

tensor([[-0.7956, 0.4054, 0.4288, -0.5565, -0.0911,

0.9981, -0.1663, 0.2311, -0.2376, 0.9196]])

[D] Is “Wasserstein metric” the right name to use?