Category: Reddit MachineLearning

[R] “Single Headed Attention RNN: Stop Thinking With Your Head”: Take that Sesame Street!

Written on November 26, 2019. Posted in Reddit MachineLearning.

One of THE Best Papers that I’ve ever read (Both in terms of the research and the paper-writeup itself):

https://arxiv.org/pdf/1911.11423.pdf

The leading approaches in language modeling are all obsessed with TV shows of my youth – namely Transformers and Sesame Street. Transformers this, Transformers that, and over here a bonfire worth of GPU-TPU-neuromorphic wafer scale silicon. We opt for the lazy path of old and proven techniques with a fancy crypto inspired acronym: the Single Headed Attention RNN (SHA-RNN). The author’s lone goal is to show that the entire field might have evolved a different direction if we had instead been obsessed with a slightly different acronym and slightly different result. We take a previously strong language model based only on boring LSTMs and get it to within a stone’s throw of a stone’s throw of state-of-the-art byte level language model results on enwik8. We also achieve state-of-the-art on WikiText-103 – or do we? This work has undergone no intensive hyperparameter optimization and lived entirely on a commodity desktop machine that made the author’s small studio apartment far too warm in the midst of a San Franciscan summer. The final results are achievable in plus or minus 24 hours on a single GPU as the author is impatient. The attention mechanism is also readily extended to large contexts and requires minimal computation. Take that Sesame Street.

submitted by /u/init__27
[link] [comments]

[R] A list of Monte Carlo tree search papers from major conferences

Written on November 26, 2019. Posted in Reddit MachineLearning.

https://preview.redd.it/2theaxlac6141.png?width=694&format=png&auto=webp&s=3e0cbe5dd33f9ce95ebb057936df46c215df57f1

https://github.com/benedekrozemberczki/awesome-monte-carlo-tree-search-papers

A curated list of Monte Carlo tree search papers with implementations from the following conferences:

Machine learning
- NeurIPS
- ICML
Computer vision
- CVPR
- ICCV
Natural language processing
- ACL
Data
- KDD
Artificial intelligence
- AAAI
- AISTATS
- IJCAI
- UAI

submitted by /u/benitorosenberg
[link] [comments]

[D] What is the latest consensus on the effect of batch size on generalization?

Written on November 26, 2019. Posted in Reddit MachineLearning.

In general is this true?

small batch size + long training ~ large batch size + short training

Just wondering because I’m not aware of any standard literature on this topic, and if anyone knows any good papers I would appreciate some references!

submitted by /u/Minimum_Zucchini
[link] [comments]

[D] [P] Machine Learning Applications for Classification of Propoganda Posts

Written on November 26, 2019. Posted in Reddit MachineLearning.

Hi! I was wondering if there have been any models / datasets that have been collected for creating a ML Model (RNN or some other architecture) that would be able to detect from text whether or not a specific text is propoganda from country A, B, or C.

In recent years, a lot of governments have employed people to make thousands of accounts and thousands of bots to disseminate misinformation. Are there any research papers or datasets that would be useful in this task?

Thanks!

submitted by /u/Showboo11
[link] [comments]

[R] Rigging the Lottery: Making All Tickets Winners

Written on November 26, 2019. Posted in Reddit MachineLearning.

submitted by /u/hardmaru
[link] [comments]

[P] Handwritten Text Recognition using Convolution Sequence to Sequence

Written on November 25, 2019. Posted in Reddit MachineLearning.

Convolution Seq-to-Seq

Instead of using RNN with Seq-to-Seq modeling, CNN with Seq-to-Seq has been used which reduces the training and inference time. The work is novel when implemented around Dec 2018. The training and testing pipeline has been created for IAM handwitten dataset. Please provide some feedback on this project and its continuity since I would like to make further advancement and formally document the work into a article.

submitted by /u/spacevstab
[link] [comments]

[D] How can i elaborate texture and statistic features in CNN?

Written on November 25, 2019. Posted in Reddit MachineLearning.

I have a dataset 2200×34 where 1-33 column are features (texture and statistic) and 34th column is the class (0 or 1). I know my dataset is quite poor, but I splitted in 80% training set and 20% validation test.
I’d like to use CNN for classification using these features, my steps are:

– Splitting in training set and validation test;

– Mean normalisation of features;

– Reshaping training set and validation set in order to have 1760x34x1 and 440x34x1 as dimensions;

– Create my model:

opt = SGD(lr=0.0001) model = Sequential() model.add(Conv1D(16, 3, activation="relu", input_shape =(34,1))) model.add(BatchNormalization()) model.add(MaxPooling1D(2)) model.add(Conv1D(32, 3, activation="relu")) model.add(MaxPooling1D(2)) model.add(Flatten()) model.add(Dense(512, activation="relu")) model.add(Dropout(0.5)) model.add(Dense(1, activation="sigmoid")) model.summary() # compile the model model.compile(loss='binary_crossentropy', optimizer= opt, metrics=['accuracy'])

Sadly my model has bad performance (acc = 55% more or less and loss = 0.69). Do you have any suggestion to increase my performance? Is there something wrong?

Here the model.summary()

Layer (type) Output Shape Param # ================================================================= conv1d_3 (Conv1D) (None, 32, 16) 64 _________________________________________________________________ batch_normalization_1 (Batch (None, 32, 16) 64 _________________________________________________________________ max_pooling1d_3 (MaxPooling1 (None, 16, 16) 0 _________________________________________________________________ conv1d_4 (Conv1D) (None, 14, 32) 1568 _________________________________________________________________ max_pooling1d_4 (MaxPooling1 (None, 7, 32) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 224) 0 _________________________________________________________________ dense_2 (Dense) (None, 512) 115200 _________________________________________________________________ dropout_1 (Dropout) (None, 512) 0 _________________________________________________________________ dense_3 (Dense) (None, 1) 513 =================================================================

submitted by /u/Samatarou
[link] [comments]

[P] Using StyleGAN to make a music visualizer

Written on November 25, 2019. Posted in Reddit MachineLearning.

I’m excited to share this generative video project I worked on with Japanese electronic music artist Qrion for the release of her Sine Wave Party EP.

Here’s the first generated video – two more coming out soon.

This was created using StyleGAN and doing a transfer learning with a custom dataset of images curated by the artist. Qrion picked images that matched the mood of each song (things like clouds, lava hitting the ocean, forest interiors, and snowy mountains) and I generated interpolation videos for each track.

The tempo of the GAN evolution is controlled by a few different things: – the beat of the song – a system I built that incorporates live input (you can tap the keyboard to add a jump to the playback in After Effects) – a keyframeable overall playback speed fader system

I’ve also posted some more images I created with Qrion’s custom models here on my site. There are some further StyleGAN experiments there too if you’re interested.

It’s been fascinating to learn how to use StyleGAN like this. As a visual effects artist, I’m over the moon with the sorts of things that are possible. Indeed, I also wanted to shout out /u/C0D32 who shared an art-centered StyleGAN model that was really influential to me! Thanks for that. Also, of course, /u/gwern who posted an incredible guide to using StyleGAN.

submitted by /u/AtreveteTeTe
[link] [comments]

[Discussion] Won’t Max Pooling mess up a lot of information in this Network?

Written on November 25, 2019. Posted in Reddit MachineLearning.

As far as I understood it, Max Pooling takes the maximum positive value and not absolute values.

In this paper, optimized mechanical structures get generated with a U-Net by feeding it mechanical information like nodal displacement, element strains, and volume fractions.

https://arxiv.org/ftp/arxiv/papers/1901/1901.07761.pdf

My question now is: won’t Max Pooling mess up a lot of information, if it just takes the positive value and not the absolute value? Since positive elements in the strain matrix correspond to tensile strains and negative to compressive ones.

submitted by /u/avdalim
[link] [comments]

[R] Unsupervised Attention Mechanism across Neural Network Layers

Written on November 25, 2019. Posted in Reddit MachineLearning.

submitted by /u/doerlbh
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[R] “Single Headed Attention RNN: Stop Thinking With Your Head”: Take that Sesame Street!

[R] A list of Monte Carlo tree search papers from major conferences

[D] What is the latest consensus on the effect of batch size on generalization?

[D] [P] Machine Learning Applications for Classification of Propoganda Posts

[R] Rigging the Lottery: Making All Tickets Winners

[P] Handwritten Text Recognition using Convolution Sequence to Sequence

[D] How can i elaborate texture and statistic features in CNN?

[P] Using StyleGAN to make a music visualizer

[Discussion] Won’t Max Pooling mess up a lot of information in this Network?

[R] Unsupervised Attention Mechanism across Neural Network Layers