Author: torontoai

[D] Tensorflow User Experience

Written on December 6, 2019. Posted in Reddit MachineLearning.

https://nostalgebraist.tumblr.com/post/189464877164/attention-conservation-notice-machine-learning

Originally saw this on Hackernews

https://news.ycombinator.com/item?id=21710863

Are things really this bad? Isn’t the TF 2.0 API cleaning supposed to make Keras the standard API for TPUs? Why doesn’t he use that?

Edit: also, is this an indictment of TF in general or just TPUs?

submitted by /u/justin285
[link] [comments]

[P] How can I make my rendered training data match real data better?

Written on December 6, 2019. Posted in Reddit MachineLearning.

I’m trying to detect a single type of boxes from a camera image. Instead of using hand labelled images for training, I want to create the data from a 3D model using blender and a python script.

So far I successfully created a dataset and trained RetinaNet on it. I do apply some augmentation (color shifts, saturation changes, noise, blurring, sharpening).

The results on a validation set (consisting of synthetic data too) are great, but the localization performance on real images is way worse.

What changes should I make to my rendering process to match real images better?

Since it’s a virtual environment, I have pretty much unlimited control over everything, but I have no clue what makes sense to try varying. Some of the detections are flawless, but others are way off and I can’t tell what’s the visual difference that throws the network off.

An example for a rendered image (training set)

Excellent results on validation set (halfway hidden boxes are supposed to be not detected)

Localization problems on real images

submitted by /u/Single_Blueberry
[link] [comments]

[R] How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

Written on December 6, 2019. Posted in Reddit MachineLearning.

submitted by /u/arkady_red
[link] [comments]

Run BERT on mobile phone’s single CUP core A76 in 13ms

Written on December 6, 2019. Posted in Reddit MachineLearning.

submitted by /u/I_ai_AI
[link] [comments]

[1912.02762] Normalizing Flows for Probabilistic Modeling and Inference

Written on December 6, 2019. Posted in Reddit MachineLearning.

submitted by /u/hardmaru
[link] [comments]

[R] Do the loss landscapes of neural networks tend to resemble the Earth’s own topography in regards to min/max elevation regions?

Written on December 6, 2019. Posted in Reddit MachineLearning.

Do the lowest loss regions of a NN tend to congregate in distinct regions with many high peaks like mountain ranges on Earth (assuming, of course, we are talking about the negative loss function so it’s maximization instead of minimization)? To elucidate, the highest summits on Earth tend to have many other peaks nearby with similar (but slightly lower) peak elevations (due to plate tectonics). One might expect- if given no prior information about Earth’s topography and assuming uniform distribution- the “tall” points on Earth to be rather randomly spread throughout Earth’s surface, but this isn’t the case as we see 90+% of the “tall” points on Earth are contained in less than 10% of the landmass. As a corollary, very rarely are high peaks not surrounded by other high peaks.

So does the NN loss landscape resemble this scenario like on Earth? Or are there pretty much just solo peaks dispersed rather randomly across the negative loss landscape? A consequence of the former would seem to indicate that if one is at a “high” point (say local max or saddle point), then other high(er) points are likely nearby.

The only literature I can seem to find exploring such an idea is here: https://arxiv.org/abs/1712.09913. The authors of this paper mapped the maximum and minimum eigenvalue ratios of the Hessian to determine the convexity of regions of a NN. It seemed to indicate the latter of these scenarios for the “smoother” networks (that solo peaks tend to occur more often) and the former for the more chaotic networks, but I could be misinterpreting. I’m unsure if convexity alone even helps answer my question since many peaks close by could all still have strongly convex curvatures.

Interested to hear others’ thoughts on the matter.

Bonus: I’m interested in this question from the perspective of Deep Q-networks (DQN) and policy gradient algorithms in reinforcement learning. I’m aware these have different loss landscapes than supervised learning due to the scarcity of rewards in RL, but if anyone has specific insights on this then that’d be great. If you’re not familiar with RL, then just assume this is about strongly supervised learning tasks such as image classification. Thanks.

submitted by /u/debussyxx
[link] [comments]

[R] Combining Q-Learning and Search with Amortized Value Estimates

Written on December 6, 2019. Posted in Reddit MachineLearning.

submitted by /u/hardmaru
[link] [comments]

[D] BERT “pooled” output? What kind of pooling?

Written on December 6, 2019. Posted in Reddit MachineLearning.

Quick question from https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1

pooled_output

: pooled output of the entire sequence with shape

[batch_size, hidden_size]

What kind of pooling are they talking about here? I don’t see it mentioned in the paper. Thanks.

submitted by /u/ME_PhD
[link] [comments]

Director of Data Scientist, People Analytics – RBC – Toronto, ON

Written on December 5, 2019. Posted in Toronto Job Postings.

Deploy production-scale solutions, transforming statistical and machine learning models. 10+ years of experience with big data technologies, Machine Learning,…
From RBC – Fri, 06 Dec 2019 23:47:18 GMT – View all Toronto, ON jobs

Director of Data Science, People Analytics – RBC – Toronto, ON

Written on December 5, 2019. Posted in Toronto Job Postings.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Author: torontoai

[D] Tensorflow User Experience

[P] How can I make my rendered training data match real data better?

[R] How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?

Run BERT on mobile phone’s single CUP core A76 in 13ms

[1912.02762] Normalizing Flows for Probabilistic Modeling and Inference

[R] Do the loss landscapes of neural networks tend to resemble the Earth’s own topography in regards to min/max elevation regions?

[R] Combining Q-Learning and Search with Amortized Value Estimates

[D] BERT “pooled” output? What kind of pooling?

Director of Data Scientist, People Analytics – RBC – Toronto, ON

Director of Data Science, People Analytics – RBC – Toronto, ON