Category: Reddit MachineLearning

[R] Evolving Space-Time Neural Architectures for Videos (Google Brain) ICCV

Written on August 27, 2019. Posted in Reddit MachineLearning.

Code: https://github.com/piergiaj/evanet-iccv19

Abstract:

We present a new method for finding video CNN architectures that capture rich spatio-temporal information in videos. Previous work, taking advantage of 3D convolutions, obtained promising results by manually designing video CNN architectures. We here develop a novel evolutionary search algorithm that automatically explores models with different types and combinations of layers to jointly learn interactions between spatial and temporal aspects of video representations. We demonstrate the generality of this algorithm by applying it to two meta-architectures, obtaining new architectures superior to manually designed architectures. Further, we propose a new component, the iTGM layer, which more efficiently utilizes its parameters to allow learning of space-time interactions over longer time horizons. The iTGM layer is often preferred by the evolutionary algorithm and allows building cost-efficient networks. The proposed approach discovers new and diverse video architectures that were previously unknown. More importantly they are both more accurate and faster than prior models, and outperform the state-of-the-art results on multiple datasets we test, including HMDB, Kinetics, and Moments in Time. We will open source the code and models, to encourage future model development.

submitted by /u/Himalun
[link] [comments]

[D] Eric Drexler’s “Reframing Superintelligence”

Written on August 27, 2019. Posted in Reddit MachineLearning.

Following the Slate Star Codex review of “Reframing Superintelligence” I (as an AI researcher) have become pretty excited to see such a comprehensive reply exists to Bostrom-type “paperclip maximizer” fears of AGI. A good summary here – Less Like Us: An Alternate Theory of Artificial General Intelligence – basically the idea is that realistically AI is not developed with the ability to self improve and do whatever it wants, so we should not fear AGIs that get out of control in this way.

What do you think of this reply to AGI concerns? Certainly given present day AI and how it is developing, the “service ai’ seems like a cogent prediction of what we can actually say is likely to come about and we need to be wary of doing wrong.

submitted by /u/regalalgorithm
[link] [comments]

[R] Google AI Blog: Exploring Weight Agnostic Neural Networks

Written on August 27, 2019. Posted in Reddit MachineLearning.

Google AI Blog: Exploring Weight Agnostic Neural Networks

In “Weight Agnostic Neural Networks” (WANN), we present a first step toward searching specifically for networks with these biases: neural net architectures that can already perform various tasks, even when they use a random shared weight. Our motivation in this work is to question to what extent neural network architectures alone, without learning any weight parameters, can encode solutions for a given task. By exploring such neural network architectures, we present agents that can already perform well in their environment without the need to learn weight parameters. Furthermore, in order to spur progress in this field community, we have also open-sourced the code to reproduce our WANN experiments for the broader research community.

We start with a population of minimal neural network architecture candidates, each with very few connections only, and use a well-established topology search algorithm (NEAT), to evolve the architectures by adding single connections and single nodes one by one.

https://weightagnostic.github.io/

Very interesting results from Google, using evolution-like approach to create network topologies. Thoughts?

submitted by /u/Marha01
[link] [comments]

[D] Do VAEs have a manifold?

Written on August 27, 2019. Posted in Reddit MachineLearning.

I am kind of confused as to how VAEs do manifold learning.

While I can grasp that regular AEs perform deterministic transformation from the input vector space to the latent space with the encoder, it is very hard for me to understand how that would work on a VAE. Is the manifold on the parameters of the distribution MU and SIGMA?

Can anyone clarify that for me, maybe point to a paper? Thanks

submitted by /u/eigenlaplace
[link] [comments]

[R] DistilBERT: A smaller, faster, cheaper, lighter BERT trained with distillation!

Written on August 27, 2019. Posted in Reddit MachineLearning.

HuggingFace released their first NLP transformer model “DistilBERT”, which is similar to the BERT architecture: only 66 million parameters (instead of 110 million) while keeping 95% of the performance on GLUE.

They released a blogpost detailing the procedure with a hands-on.

It is also available on their repository pytorch-transformers alongside 7 other transformer models.

submitted by /u/jikkii
[link] [comments]

[R] A 2019 Guide to Speech Synthesis with Deep Learning

Written on August 27, 2019. Posted in Reddit MachineLearning.

In this research, I look at how deep learning has been used in voice generation.

https://heartbeat.fritz.ai/a-2019-guide-to-speech-synthesis-with-deep-learning-630afcafb9dd

submitted by /u/mwitiderrick
[link] [comments]

[D] Alternatives to Backpropagation

Written on August 27, 2019. Posted in Reddit MachineLearning.

As now it is widespread that backpropagation is not a biologically plausible approach, I would like to raise a discussion around alternatives for the method.
In my mind, a cool idea would be to evaluate the outputs of each layer individually, i.e., what should we expect to see as output for the hidden layer number L? This would remove the need of backward sweeps (because a layer’s ‘accuracy’ would depend only of itself) and make transfer learning a lot easier (cause if it’s a layer-by-layer learning, we can put pieces together for similar task, with minor adjusments if necessary, e.g. the first layer of a CNN that identifies cats might be useful to identifying other felines).
However, nothing comes to my mind as to how we could achieve that. Because, as I see, this would require us to have labels (or at least some representations for us to compare what we’re getting to what we want) and I don’t think labels are required when we humans learn (at least not too many labels).
Anyway, I’d love to hear ideas from other minds, as I think this is the best way for us to come up with newer ideas.
Cheers guys, have a good one 🙂

submitted by /u/Berdas_
[link] [comments]

[D] Design a network what combines supervised (CNN) and unsupervised (AE) for classification task

Written on August 27, 2019. Posted in Reddit MachineLearning.

Hello everyone! Working under one interesting problem, as you can read from post name, and wonder does anyone have ideas or hints for it? As we know autoencoders take input (in my case it’s an image from the popular dataset) and reconstruct it as an output. Let’s call input – node 1, output – node 3. It creates valuable features at its hidden layers (let’s call it node 2) during the process. Let’s hypothesize, that if node 2 is used as input for CNN then the classification will be improved. My current ideas are:
1 – For now, it sounds interesting and reasonable to try use output of the encoder – latent space representation as an input for following CNN.

2 – Use one of the decoder layers as input for CNN.

A possible purpose of it – try to get more important futures from class imbalanced data. (As an example – from 5 classes 1 of them contain 50% fewer images than other). Let’s discuss?

submitted by /u/brhrrr
[link] [comments]

[D] Computing `q dot q` instead of `q dot k` when calculating scores for self-attention in Transformer

Written on August 27, 2019. Posted in Reddit MachineLearning.

Going through the Transformer paper, and its implementation, I have had a question:

In the self-attention routine in the encoder, is it plausible to compute q dot q instead of q dot k when calculating scores for each input token?

I see that in the self-attention, the memory_antecedent = query_antecedent and q, k, v is computed (and trained) separately (c.f. compute_qkv in T2T).

Would utilizing the same q for the computation of scores (rather than having a separate k) seriously deteriorate the performance?

submitted by /u/kingsiguk
[link] [comments]

[D] Is learning label embedding by factorizing label co-occurrence matrix unsupervised learning?

Written on August 27, 2019. Posted in Reddit MachineLearning.

Hi all!

I was working on creating embeddings for medical concepts. These terms/phrases are used for annotating biomedical documents. Now usually the method of creating a co-occurrence matrix and then factorizing it to obtain dense, lower-dimensional vectors is termed as unsupervised learning since annotated data is not involved. I am using the same process but for the annotations themselves. Does this qualify as supervised learning since I need annotated data or does this qualify as unsupervised learning since the method of obtaining the embeddings is unsupervised?

submitted by /u/atif_hassan
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[R] Evolving Space-Time Neural Architectures for Videos (Google Brain) ICCV

[D] Eric Drexler’s “Reframing Superintelligence”

[R] Google AI Blog: Exploring Weight Agnostic Neural Networks

[D] Do VAEs have a manifold?

[R] DistilBERT: A smaller, faster, cheaper, lighter BERT trained with distillation!

[R] A 2019 Guide to Speech Synthesis with Deep Learning

[D] Alternatives to Backpropagation

[D] Design a network what combines supervised (CNN) and unsupervised (AE) for classification task

[D] Computing `q dot q` instead of `q dot k` when calculating scores for self-attention in Transformer

[D] Is learning label embedding by factorizing label co-occurrence matrix unsupervised learning?