Category: Reddit MachineLearning

[1912.01412] Deep Learning for Symbolic Mathematics

Written on December 4, 2019. Posted in Reddit MachineLearning.

submitted by /u/avturchin
[link] [comments]

[R] NeurIPS 2019 Livestream

Written on December 4, 2019. Posted in Reddit MachineLearning.

aideeptalk will livestream the expo and posters at NeurIPS 2019 on Twitch at twitch.tv/aideeptalk

To receive a notification when we go live, please follow us and enable notifications on our Twitch channel.

Please pass this on to those who can’t make it to NeurIPS

For more details see our website aideeptalk.com

submitted by /u/aideeptalk
[link] [comments]

[D] Confused about generating a translation using Transformer

Written on December 4, 2019. Posted in Reddit MachineLearning.

I’m reading the Attention Is All You Need paper and it doesn’t seem to explain how exactly the Transformer is used to generate a translation. Here’s how I understand it so far (please correct if I’m wrong):

A sequence of k tokens comes in as one-hot vectors of length v – the vocab size. This is a (k x v) token matrix.
The tokens are embedded in d_m (model size, e.g. 512) dimensional space via multiplication by an Embedding matrix E of dim (v x d_m), yielding a (k x d_m).
Positional Encodings added, dim is still (k x d_m).

Encoding:

Encoder block takes in the (k x d_m) matrix and outputs another (k x d_m) matrix.
Repeat N times to get a final (k x d_m) matrix, i.e. the encoder output.

Now for decoding:

The decoder takes in a (p x d_m) matrix and adds positional encodings.
The (non-masked) multi-head attention function inside the decoder receives encoder’s (k x d_m) output as key K, and value V, and a (p x d_m) matrix as the query Q, yielding a (p x d_m) output.
The final output of the decoder is therefore (p x d_m).

Final output:

The (p x d_m) decoder output is mapped to (p x v) by a matrix multiply (Question: they say it’s “tied” to the embedding matrix E, so is this just E^T?).
Select the max of each value in the p rows (softmax), so you get p tokens out.

Suppose I want to translate the sequence “This attention paper is super confusing !” into German. Here k = 7, so my encoder outputs a (7 x 512) matrix. From here, can someone walk me through the steps of generating the translation?

Thanks for looking at my question and have an awesome day!

submitted by /u/ME_PhD
[link] [comments]

[R] Piecewise Strong Convexity of Neural Networks

Written on December 4, 2019. Posted in Reddit MachineLearning.

Paper: https://arxiv.org/abs/1810.12805

Video summary: https://www.youtube.com/watch?v=z89BTMQGVn

Earlier related work: https://arxiv.org/abs/1607.04917 (piecewise convexity)

I am not the author. This paper will be presented at NeurIPS this month and exposes some convexity results about piece-wise linear nns under the least squares loss – namely piecewise strong-convexity & the non-existance of differentiable local maxima. The approach is a spectral analysis of the Hessian and weights of the nn. The result is a relatively attractive convergence estimate for sgd.

I guess this provides some more motivation for studying techniques like ADMM which have convergence properties for some classes of piece-wise functions and can exploit lipschitz cts gradients. Nice work!

submitted by /u/i-heart-turtles
[link] [comments]

[D] Best network for battle game agents (neuroevolution)

Written on December 4, 2019. Posted in Reddit MachineLearning.

Hi,

I’m working on an indie game where you evolve teams of agents that each have a neural network, and then battle them against other players team (looks like this: https://youtu.be/EPekL1JMXEY).

I already have an implementation of sparse lstm-ish networks (1), but I’d like to optimize this further and wanted to see what people here have to suggest. Since it’s an evolution based game I don’t use backprop. It also needs to be fairly simple as it all runs on the GPU (which is why I can have simulations of thousands of agents on a single machine). And since it all runs on the GPU I’d prefer something that is fixed size, which is why I’ve stayed away from NEAT so far.

So my question is; what would be the best network for something like this?

(1) My current network works as follows; Each node has: a state (1 float), 12 indices, 12 weights, 2 bias values. Each index decides which other nodes it “reads” from, so a node can be connected to any other node (layers are therefore less important, they only decide the order of updates). 8 of the inputs are used for the next value of the state. 4 of the inputs are used as a write gate (-1 keep state -> 1 update state). There are some more details but that’s roughly it.

submitted by /u/FredrikNoren
[link] [comments]

[D] Figure Eight and alternatives

Written on December 4, 2019. Posted in Reddit MachineLearning.

Hey,

We are considering to use Figure Eight in our company.

Has someone tried to use Figure Eight as an annotation plartform? What are the pros and cons? How is it compared to Amazon in terms of price, annotation quality etc?

Would love to hear your opinions.

submitted by /u/guzguzit
[link] [comments]

[D] Evolutionary Algorithms researchers, do you feel like a new library is needed?

Written on December 4, 2019. Posted in Reddit MachineLearning.

Researchers who work with Evolutionary Algorithms, do you feel there are current libraries that satisfy your needs? Do you feel there is a need for a new library? What features do you feel are missing/you need?

submitted by /u/ghost_shaba7
[link] [comments]

[D] Efficient Partial Dependence Plots with decision trees

Written on December 4, 2019. Posted in Reddit MachineLearning.

Partial Dependence Plots (PDPs) are a standard model inspection technique. It turns out that for decision trees, they can be computed very efficiently. This post explains how PDPs are computed in general, and goes into the details of the optimized version for tree models.

http://nicolas-hug.com/blog/pdps

submitted by /u/Niourf
[link] [comments]

[D] Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

Written on December 4, 2019. Posted in Reddit MachineLearning.

A recent paper by Cynthia Rudin claims “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead”: https://arxiv.org/abs/1811.10154

A summary of the paper can be found here: https://www.kdnuggets.com/2019/11/stop-explaining-black-box-models.html

Thoughts?

submitted by /u/selib
[link] [comments]

[D] Validating regression models on edge cases?

Written on December 4, 2019. Posted in Reddit MachineLearning.

I’m trying to predict USED car prices, given some x number of parameters.

The R² is > 0.98 on the testing data, but it misses predictions on new data with edge cases by (what I think of as) too much.

Past the metric for evaluating, how can we validate that a result is good enough, even for an edge case.

Currently, I’m thinking about making some linear regression model and fitting varyingly different age and kilometers, then predicting on price. This would give me a model, where I could predict my edge case predictions on and fit it to a more average case.

I’m really just seeking advice on what to do here. Is the approach good enough? What are other approaches for validation / sanity checking if each sample we try to predict individually is good enough?

submitted by /u/permalip
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[1912.01412] Deep Learning for Symbolic Mathematics

[R] NeurIPS 2019 Livestream

[D] Confused about generating a translation using Transformer

[R] Piecewise Strong Convexity of Neural Networks

[D] Best network for battle game agents (neuroevolution)

[D] Figure Eight and alternatives

[D] Evolutionary Algorithms researchers, do you feel like a new library is needed?

[D] Efficient Partial Dependence Plots with decision trees

[D] Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

[D] Validating regression models on edge cases?