Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[R] NeurIPS 2019 Livestream

aideeptalk will livestream the expo and posters at NeurIPS 2019 on Twitch at twitch.tv/aideeptalk

To receive a notification when we go live, please follow us and enable notifications on our Twitch channel.

Follow us on twitter.com/aideeptalk for our schedule.

Please pass this on to those who can’t make it to NeurIPS

For more details see our website aideeptalk.com

submitted by /u/aideeptalk
[link] [comments]

[D] Confused about generating a translation using Transformer

I’m reading the Attention Is All You Need paper and it doesn’t seem to explain how exactly the Transformer is used to generate a translation. Here’s how I understand it so far (please correct if I’m wrong):

  1. A sequence of k tokens comes in as one-hot vectors of length v – the vocab size. This is a (k x v) token matrix.
  2. The tokens are embedded in d_m (model size, e.g. 512) dimensional space via multiplication by an Embedding matrix E of dim (v x d_m), yielding a (k x d_m).
  3. Positional Encodings added, dim is still (k x d_m).

Encoding:

  1. Encoder block takes in the (k x d_m) matrix and outputs another (k x d_m) matrix.
  2. Repeat N times to get a final (k x d_m) matrix, i.e. the encoder output.

Now for decoding:

  1. The decoder takes in a (p x d_m) matrix and adds positional encodings.
  2. The (non-masked) multi-head attention function inside the decoder receives encoder’s (k x d_m) output as key K, and value V, and a (p x d_m) matrix as the query Q, yielding a (p x d_m) output.
  3. The final output of the decoder is therefore (p x d_m).

Final output:

  1. The (p x d_m) decoder output is mapped to (p x v) by a matrix multiply (Question: they say it’s “tied” to the embedding matrix E, so is this just E^T?).
  2. Select the max of each value in the p rows (softmax), so you get p tokens out.

Suppose I want to translate the sequence “This attention paper is super confusing !” into German. Here k = 7, so my encoder outputs a (7 x 512) matrix. From here, can someone walk me through the steps of generating the translation?

Thanks for looking at my question and have an awesome day!

submitted by /u/ME_PhD
[link] [comments]

[R] Piecewise Strong Convexity of Neural Networks

Paper: https://arxiv.org/abs/1810.12805

Video summary: https://www.youtube.com/watch?v=z89BTMQGVn

Earlier related work: https://arxiv.org/abs/1607.04917 (piecewise convexity)

I am not the author. This paper will be presented at NeurIPS this month and exposes some convexity results about piece-wise linear nns under the least squares loss – namely piecewise strong-convexity & the non-existance of differentiable local maxima. The approach is a spectral analysis of the Hessian and weights of the nn. The result is a relatively attractive convergence estimate for sgd.

I guess this provides some more motivation for studying techniques like ADMM which have convergence properties for some classes of piece-wise functions and can exploit lipschitz cts gradients. Nice work!

submitted by /u/i-heart-turtles
[link] [comments]

[D] Best network for battle game agents (neuroevolution)

Hi,

I’m working on an indie game where you evolve teams of agents that each have a neural network, and then battle them against other players team (looks like this: https://youtu.be/EPekL1JMXEY).

I already have an implementation of sparse lstm-ish networks (1), but I’d like to optimize this further and wanted to see what people here have to suggest. Since it’s an evolution based game I don’t use backprop. It also needs to be fairly simple as it all runs on the GPU (which is why I can have simulations of thousands of agents on a single machine). And since it all runs on the GPU I’d prefer something that is fixed size, which is why I’ve stayed away from NEAT so far.

So my question is; what would be the best network for something like this?

(1) My current network works as follows; Each node has: a state (1 float), 12 indices, 12 weights, 2 bias values. Each index decides which other nodes it “reads” from, so a node can be connected to any other node (layers are therefore less important, they only decide the order of updates). 8 of the inputs are used for the next value of the state. 4 of the inputs are used as a write gate (-1 keep state -> 1 update state). There are some more details but that’s roughly it.

submitted by /u/FredrikNoren
[link] [comments]

[D] Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

A recent paper by Cynthia Rudin claims “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead”: https://arxiv.org/abs/1811.10154

A summary of the paper can be found here: https://www.kdnuggets.com/2019/11/stop-explaining-black-box-models.html

Thoughts?

submitted by /u/selib
[link] [comments]

[D] Validating regression models on edge cases?

I’m trying to predict USED car prices, given some x number of parameters.

The R2 is > 0.98 on the testing data, but it misses predictions on new data with edge cases by (what I think of as) too much.

Past the metric for evaluating, how can we validate that a result is good enough, even for an edge case.

Currently, I’m thinking about making some linear regression model and fitting varyingly different age and kilometers, then predicting on price. This would give me a model, where I could predict my edge case predictions on and fit it to a more average case.

I’m really just seeking advice on what to do here. Is the approach good enough? What are other approaches for validation / sanity checking if each sample we try to predict individually is good enough?

submitted by /u/permalip
[link] [comments]