Category: Reddit MachineLearning
[R] Do the loss landscapes of neural networks tend to resemble the Earth’s own topography in regards to min/max elevation regions?
Do the lowest loss regions of a NN tend to congregate in distinct regions with many high peaks like mountain ranges on Earth (assuming, of course, we are talking about the negative loss function so it’s maximization instead of minimization)? To elucidate, the highest summits on Earth tend to have many other peaks nearby with similar (but slightly lower) peak elevations (due to plate tectonics). One might expect- if given no prior information about Earth’s topography and assuming uniform distribution- the “tall” points on Earth to be rather randomly spread throughout Earth’s surface, but this isn’t the case as we see 90+% of the “tall” points on Earth are contained in less than 10% of the landmass. As a corollary, very rarely are high peaks not surrounded by other high peaks.
So does the NN loss landscape resemble this scenario like on Earth? Or are there pretty much just solo peaks dispersed rather randomly across the negative loss landscape? A consequence of the former would seem to indicate that if one is at a “high” point (say local max or saddle point), then other high(er) points are likely nearby.
The only literature I can seem to find exploring such an idea is here: https://arxiv.org/abs/1712.09913. The authors of this paper mapped the maximum and minimum eigenvalue ratios of the Hessian to determine the convexity of regions of a NN. It seemed to indicate the latter of these scenarios for the “smoother” networks (that solo peaks tend to occur more often) and the former for the more chaotic networks, but I could be misinterpreting. I’m unsure if convexity alone even helps answer my question since many peaks close by could all still have strongly convex curvatures.
Interested to hear others’ thoughts on the matter.
Bonus: I’m interested in this question from the perspective of Deep Q-networks (DQN) and policy gradient algorithms in reinforcement learning. I’m aware these have different loss landscapes than supervised learning due to the scarcity of rewards in RL, but if anyone has specific insights on this then that’d be great. If you’re not familiar with RL, then just assume this is about strongly supervised learning tasks such as image classification. Thanks.
submitted by /u/debussyxx
[link] [comments]
[D] BERT “pooled” output? What kind of pooling?
Quick question from https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
pooled_output
: pooled output of the entire sequence with shape
[batch_size, hidden_size]
What kind of pooling are they talking about here? I don’t see it mentioned in the paper. Thanks.
submitted by /u/ME_PhD
[link] [comments]
Call for Participation in Interview Study on Automated Machine Learning
Hi everyone,
We are a team of graduate researchers from UC Berkeley, conducting a study on the usage of automated machine learning (auto-ML), including but not limited to Azure AutoML, Amazon Sagemaker, TPOT, DataRobot, H2O Driverless AI, auto-sklearn, Google AutoML. If you have used auto-ML (in the past or currently), we’d greatly appreciate it if you could please participate in a one-hour interview with us. During the interview, we will ask you questions about your experience using automated machine learning, focusing on a specific problem you have worked on.
Please sign up for the interview using this form http://bit.ly/autoMLStudy. We will contact you with further information via email for the study. You will receive a $15 gift card at the completion of the interview as a token of appreciation for your time and insights!
submitted by /u/hcihci
[link] [comments]
[D] Is it possible to convert a regression problem into classification?
For example in autoencoders, instead of using MSE or MAE, what if we pick some random images from training set, plus all the images from current batch, stack them into a matrix and use it as the output layer, then use softmax plus cross entropy to predict the correct image? would this work?
submitted by /u/DeMorrr
[link] [comments]
[D] What content do you feel is missing from the world of ML podcasts?
Hey all. I often feel like this is the age of podcasts, and in the world of ML podcasts have been popping up left and right (my personal favorite is the Artificial Intelligence podcast by Lex Fridman).
In a space that almost feels saturated, have you ever felt like something is missing? A certain angle on ML, application, etc.?
submitted by /u/nevereallybored
[link] [comments]