Category: Reddit MachineLearning

[P] Simple and effective phrase finding in multi-language?

Written on December 18, 2019. Posted in Reddit MachineLearning.

Dueling with out-of-vocabulary word or phrases is been a problem on nlp, sometime using deep learning cost too much.

Maybe we can use a simple statistic way first, finding potential phrases base on word boundary.

how?

there is a drop on the boundary of phrases in a sentence, for example, one of the sentence in attention is all you need:

…multi-head attention in three different ways…

multi-head — frequency 10 multi-head attention — frequency 8 multi-head attention in — frequency 1 <- drop !! multi-head attention in three — frequency 1

To capture this drop, it can give us some potential phrases.so I create a library to help this out.

GitHub project – Phraseg

phraseg = Phraseg(''' The goal of reducing sequential computation also forms the foundation of the Extended Neural GPU [16], ByteNet [18] and ConvS2S [9], all of which use convolutional neural networks as basic building block, computing hidden representations in parallel for all input and output positions. In these models, the number of operations required to relate signals from two arbitrary input or output positions grows in the distance between positions, linearly for ConvS2S and logarithmically for ByteNet. This makes it more difficult to learn dependencies between distant positions [12]. In the Transformer this is reduced to a constant number of operations, albeit at the cost of reduced effective resolution due to averaging attention-weighted positions, an effect we counteract with Multi-Head Attention as described in section 3.2. Self-attention, sometimes called intra-attention is an attention mechanism relating different positions of a single sequence in order to compute a representation of the sequence. Self-attention has been used successfully in a variety of tasks including reading comprehension, abstractive summarization, textual entailment and learning task-independent sentence representations [4, 27, 28, 22]. End-to-end memory networks are based on a recurrent attention mechanism instead of sequence- aligned recurrence and have been shown to perform well on simple-language question answering and language modeling tasks [34]. To the best of our knowledge, however, the Transformer is the first transduction model relying entirely on self-attention to compute representations of its input and output without using sequence- aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate self-attention and discuss its advantages over models such as [17, 18] and [9]. ''') result = phraseg.extract()

The result will be:

[('the Transformer', 3), ('of the', 2), ('ConvS 2 S', 2), ('input and output', 2), ('output positions', 2), ('number of operations', 2), ('In the', 2), ('attention mechanism', 2), ('to compute', 2)]

Application

we may use this to explore the daily trending of GitHub repo:

https://colab.research.google.com/drive/133uFefx7nMgeuah4FfHZjpqmqfxTyKui

Detail about how it works:

https://medium.com/@voidful.stack/simple-and-effective-phrase-finding-in-multi-language-42264554acb

GitHub project:

https://github.com/voidful/Phraseg

submitted by /u/voidful-stack
[link] [comments]

[R] Neural networks grown and self-organized by noise (NeurIPS2019)

Written on December 17, 2019. Posted in Reddit MachineLearning.

submitted by /u/hardmaru
[link] [comments]

[R] Deep Learning for Symbolic Mathematics

Written on December 17, 2019. Posted in Reddit MachineLearning.

submitted by /u/DiogenicOrder
[link] [comments]

Removing blob artifact from StyleGAN generations without retraining. Inspired by StyleGAN2

Written on December 17, 2019. Posted in Reddit MachineLearning.

Got StyleGAN generator working without producing the blob artifact using the same architecture/weights.

This might be useful for those, who have already trained a model using the initial version of StyleGAN, but still want to produce generations without the blob artifacts.

https://reddit.com/link/ecji6v/video/h4m9fryurg541/player

The idea is pretty simple. I observed, that the artifact appears right after 64×64 resolution. Network tries to fool the instance normalization layer and creates one or two entries in a tensor that have the same order of magnitude as the sum of the rest of the tensor. I simply zero out those entries.

However, doing just that would ruing the generation. Instead, starting from resolution 64×64 I execute two branches: one with original tensor and the second one with pruned tensor. The original one is used to compute coefficients for the instance normalization, that are later applied to the pruned branch.

https://twitter.com/StasPodgorskiy/status/1207369489676996614

https://github.com/podgorskiy/StyleGAN_Blobless

submitted by /u/stpidhorskyi
[link] [comments]

[D] LSTM – Constant Error Carrousel

Written on December 17, 2019. Posted in Reddit MachineLearning.

In his award-winning Neural Network overview (yes, he won the first best paper award of this journal), Schmidhuber discusses the LSTM here (https://arxiv.org/pdf/1404.7828.pdf, p. 19) as follows:

“The basic LSTM idea is very simple. Some of the units are called Constant Error Carousels (CECs). Each CEC uses as an activation function f, the identity function, and has a connection to itself with fixed weight of 1.0. Due to f’s constant derivative of 1.0, errors backpropagated through a CEC cannot vanish or explode (Sec. 5.9) but stay as they are (unless they “flow out” of the CEC to other, typically adaptive parts of the NN).”

What does Schmidhuber mean there? Where is the fixed weight 1.0 and the identity function as the activation function? Can somebody relate this to the common LSTM equations, for example in https://colah.github.io/posts/2015-08-Understanding-LSTMs/ ?

submitted by /u/ManiacMalcko
[link] [comments]

[R] Simple trick to double deep learning speed + CNN and GPU Benchmarks

Written on December 17, 2019. Posted in Reddit MachineLearning.

With NeurIPS behind us and ICML ahead, maybe you want to do some deep learning. Inspired by Justin Johnson’s original work benchmarking the older GTX GPUs, I extended this work to the new RTX GPUs with benchmarks for most ResNet architectures on ImageNet and CIFAR. Along the way, I discovered a dramatic difference in performance based on how you position your GPUs. Enjoy, and please comment if you have questions or feedback 🙂

Benchmarking post: https://l7.curtisnorthcutt.com/benchmarking-gpus-for-deep-learning
Positioning (speed-up) post: https://l7.curtisnorthcutt.com/gpu-positioning
GitHub (benchmarks): https://github.com/cgnorthcutt/cnn-gpu-benchmarks

The 4-gpu deep learning workstation used for these benchmarks.

submitted by /u/cgnorthcutt
[link] [comments]

[D] Looking for papers on “filling in” parts of images (inpainting)

Written on December 17, 2019. Posted in Reddit MachineLearning.

Collecting research for a project similar to this paper. Having a hard time finding more resources. TYIA

submitted by /u/seiqooq
[link] [comments]

[D] AMA Interview with the CEO of Kaggle: Anthony Goldbloom | Chai Time Data Science Show

Written on December 17, 2019. Posted in Reddit MachineLearning.

Hi Everyone,

I’m really excited to be interviewing someone from the Otherside of Kaggle this time: The CEO of Kaggle: Anthony Goldbloom and he’s also said yes to an AMA interview!

Please feel free to post any/all questions, if you like here or as replies to this Kaggle thread: https://www.kaggle.com/getting-started/122215

And I’ll try my best to include them, This interview will be released on the Chai Time Data Science Podcast, available both as Video, Audio.

Thank You in Advance for the Questions!

submitted by /u/init__27
[link] [comments]

[News] Safe sexting app does not withstand AI

Written on December 17, 2019. Posted in Reddit MachineLearning.

A few weeks ago, the .comdom app was released by Telenet, a large Belgian telecom provider. The app aims to make sexting safer, by overlaying a private picture with a visible watermark that contains the receiver’s name and phone number. As such, a receiver is discouraged to leak nude pictures.

Example of watermarked image

The .comdom app claims to provide a safer alternative than apps such as Snapchat and Confide, which have functions such as screenshot-proofing and self-destructing messages or images. These functions only provide the illusion of security. For example, it’s simple to capture the screen of your smartphone using another camera, and thus cirumventing the screenshot-proofing and self-destruction of the private images. However, we found that the .comdom app only increases the illusion of security.

In a matter of days, we (IDLab-MEDIA from Ghent University) were able to automatically remove these visible watermarks from images. We watermarked thousands of random pictures in the same way that the .comdom app does, and provided those to a simple convolutional neural network with these images. As such, the AI algorithm learns to perform some form of image inpainting.

Unwatermarked image, using our machine learning algorithm

Thus, the developers of the .comdom have underestimated the power of modern AI technologies.

More info on the website of our research group: http://media.idlab.ugent.be/2019/12/05/safe-sexting-in-a-world-of-ai/

submitted by /u/idlab-media
[link] [comments]

[Discussion] – What to do if your model ignores the input and learns the labels?

Written on December 17, 2019. Posted in Reddit MachineLearning.

Hi everyone,

I’m working on this time-series regression problem and I’ve already gone through the following stages:

prepared different datasets by adding first only the series itself, then moving averages, then sentiment data, etc;
trained benchmarks: persistent models, linear regressions, ARIMA, …
tried a variety of different deep learning architectures (MLPs, resnets, wavenets, lstm, etc.)

So, what happens is that no matter (i) how the dataset is built and (ii) how complex or fancy the architecture is but the model always end up ignoring the input and predicting as output at timestep t the input a timestep t-1, which is called a persistent model in literature and (it’s one of the benchmarks)

TL; DR:

Time series framing problem: DL models (of several architectures) end up totally ignoring the input and learn to give always the same prediction: ŷ(t) = x(t-1)

Q: How to address this issue? Is there a way to penalise this behaviour during the training?

submitted by /u/Synchro–
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[P] Simple and effective phrase finding in multi-language?

Dueling with out-of-vocabulary word or phrases is been a problem on nlp, sometime using deep learning cost too much.

[R] Neural networks grown and self-organized by noise (NeurIPS2019)

[R] Deep Learning for Symbolic Mathematics

Removing blob artifact from StyleGAN generations without retraining. Inspired by StyleGAN2

[D] LSTM – Constant Error Carrousel

[R] Simple trick to double deep learning speed + CNN and GPU Benchmarks

[D] Looking for papers on “filling in” parts of images (inpainting)

[D] AMA Interview with the CEO of Kaggle: Anthony Goldbloom | Chai Time Data Science Show

[News] Safe sexting app does not withstand AI

[Discussion] – What to do if your model ignores the input and learns the labels?