Author: torontoai

[D] Do CNNs understand semantic relationships between their classes?

Written on October 23, 2019. Posted in Reddit MachineLearning.

Hi all,

How do CNNs understand “compositional semantic relationships” between their classes? The problem exists in the entire field, but I’m referencing this paper in particular: http://gandissect.csail.mit.edu/

In the 3rd paragraph of the introduction of the paper, Bau et al. say (emphasis mine):

To a human observer, a well-trained GAN appears to have learned facts about the objects in the image: for example, a door can appear on a building but not on a tree. We wish to understand how a GAN represents such structure. Do the objects emerge as pure pixel patterns without any explicit representation of objects such as doors and trees, or does the GAN contain internal variables that correspond to the objects that humans perceive?

From my (limited) understanding of CNNs, they take in the input image (HxWx3 channels) and pass it through a bunch of filters and maxpool layers. Each maxpool layer reduces the HxW of the matrix. Each filter layer increases the depth of matrix as we move away from the image and toward the “highest levels of representation.” In a sense, we’re abstracting toward higher and higher level features as the receptive field of our neurons increases and we’re able to model longer-distance relationships.

Finally, the last layer of the CNN is connected to the class output by a fully connected layer.

From what I’ve read, it seems like the field is a bit split on this. You have this paper saying

oh look! GANs (and thus CNNs) can understand relationships between classes because doors don’t appear in the sky

But others also say it’s an explicit shortcoming of the convolution function — that the spatial equivariance of convolution means it inherently cannot understand these relationships.

A CNN only looks for 2 eyes, 1 nose, and 1 mouth. It doesn’t care that the eyes are parallel and above the nose, or that the nose is above the mouth!

————

My take is that the CNN can understand broadly the correlations between classes because of the last, FC layer. As a result, it can understand that maybe standing is negatively correlated with beer, or that pens are correlated with paper. But, it can’t understand spatial relationships.

What do you guys think of this issue?

submitted by /u/sabot00
[link] [comments]

[D] A Unifying Framework of Bilinear LSTMs

Written on October 23, 2019. Posted in Reddit MachineLearning.

Disclaimer: this is my paper that I’ve been working on, if this sort of thing is not allowed on /r/ml please let me know.

arXiv page: https://arxiv.org/abs/1910.10294

Abstract: This paper presents a novel unifying framework of bilinear LSTMs that can represent and utilize the nonlinear interaction of the input features present in sequence datasets for achieving superior performance over a linear LSTM and yet not incur more parameters to be learned. To realize this, our unifying framework allows the expressivity of the linear vs. bilinear terms to be balanced by correspondingly trading off between the hidden state vector size vs. approximation quality of the weight matrix in the bilinear term so as to optimize the performance of our bilinear LSTM, while not incurring more parameters to be learned. We empirically evaluate the performance of our bilinear LSTM in several language-based sequence learning tasks to demonstrate its general applicability.

Comments: This approach is novel because it considers improvement through the use of bilinear neurons (essentially polynomial regression + nonlinearity) as a building block. This is typically not done in neural networks as it is typically accepted that linear neuron + nonlinearity is sufficient as a universal approximator. However, we find that performance improvement can be achieved without incurring additional learnable parameters if bilinear neurons are used. It should be noted that the original proof on the universal approximability of linear neurons (Cybenko, 1989) does not show that they are efficient.

submitted by /u/ml_mohit
[link] [comments]

Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules

Written on October 23, 2019. Posted in Google.

Posted by Alexander B Wiltschko, Senior Research Scientist, Google Research

Smell is a sense shared by an incredible range of living organisms, and plays a critical role in how they analyze and react to the world. For humans, our sense of smell is tied to our ability to enjoy food and can also trigger vivid memories. Smell allows us to appreciate all of the fragrances that abound in our everyday lives, be they the proverbial roses, a batch of freshly baked cookies, or a favorite perfume. Yet despite its importance, smell has not received the same level of attention from machine learning researchers as have vision and hearing.

Odor perception in humans is the result of the activation of 400 different types of olfactory receptors (ORs), expressed in 1 million olfactory sensory neurons (OSNs), in a small patch of tissue called the olfactory epithelium. These OSNs send signals to the olfactory bulb, and then to further structures in the brain. Based on analogous advances in deep learning for sight and sound, it should be possible to directly predict the end sensory result of an input molecule, even without knowing the intricate details of all the systems involved. Solving the odor prediction problem would aid in discovering new synthetic odorants, thereby reducing the ecological impact of harvesting natural products. Inspection of the resulting olfactory models may even lead to new insights into the biology of smell.

Small odorant molecules are the most basic building blocks of flavors and fragrances, and therefore represent the simplest version of the odor prediction problem. Yet each molecule can have multiple odor descriptors. Vanillin, for example, has descriptors such as sweet, vanilla, creamy, and chocolate, with some notes being more apparent than others. So odor prediction is also a multi-label classification problem.

In “Machine Learning for Scent: Learning Generalizable Perceptual Representations of Small Molecules”, we leverage graph neural networks (GNNs), a kind of deep neural network designed to operate on graphs as input, to directly predict the odor descriptors for individual molecules, without using any handcrafted rules. We demonstrate that this approach yields significantly improved performance in odor prediction compared to current state-of-the-art and is a promising direction for future research.

Graph Neural Networks for Odor Prediction
Since molecules are analogous to graphs, with atoms forming the vertices and bonds forming the edges, GNNs are the natural model of choice for their understanding. But how does one translate the structure of a molecule into a graph representation? Initially, every node in the graph is represented as a vector, using any preferred featurization — atom identity, atom charge, etc. Then, in a series of message passing steps, every node broadcasts its current vector value to each of its neighbors. An update function then takes the collection of vectors sent to it, and generates an updated vector value. This process can be repeated many times, until finally all of the nodes in the graph are summarized into a single vector via summing or averaging. That single vector, representing the entire molecule, can then be passed into a fully connected network as a learned molecular featurization. This network outputs a prediction for odor descriptors, as provided by perfume experts.

Each node is represented as a vector, and each entry in the vector initially encodes some atomic-level information.

For each node we look at adjacent nodes and collect their information, which is then transformed with a neural network into new information for the centered node. This procedure is performed iteratively. Other variants of GNNs utilize edge and graph-level information.

Illustration of a GNN for odor prediction. We translate the structure of molecules into graphs that are fed into GNN layers to learn a better representation of the nodes. These nodes are reduced into a single vector and passed into a neural network that is used to predict multiple odor descriptors.

This representation doesn’t know anything about spatial positions of atoms, and so it can’t distinguish stereoisomers, molecules made of the same atoms but in slightly different configurations that can smell different, such as (R)- and (S)-carvone. Nevertheless, we have found that even without distinguishing stereoisomers, in practice it is still possible to predict odor quite well.

For odor prediction, GNNs consistently demonstrate improved performance compared to previous state-of-the-art methods, such as random forests, which do not directly encode graph structure. The magnitude of the improvement depends on which odor one tries to predict.

Example of the performance of a GNN on odor descriptors against a strong baseline, as measured by the AUROC score. Example odor descriptors are picked randomly. Closer to 1.0 means better. In the majority of cases GNNs outperform the field-standard baseline substantially, with similar performance seen against other metrics (e.g., AUPRC, recall, precision).

Learning from the Model, and Extending It to Other Tasks
In addition to predicting odor descriptors, GNNs can be applied to other olfaction tasks. For example, take the case of classifying new or refined odor descriptors using only limited data. For each molecule, we extract a learned representation from an intermediate layer of the model that is optimized for our odor descriptors, which we call an “odor embedding”. One can think of this as an olfaction version of a color space, like RGB or CMYK. To see if this odor embedding is useful for predicting related but different tasks, we designed experiments that test our learned embedding on related tasks for which it was not originally designed. We then compared the performance of our odor embedding representation to a common chemoinformatic representation that encodes structural information of a molecule, but is agnostic to odor and found that the odor embedding generalized to several challenging new tasks, even matching state-of-the-art on some.

2D snapshot of our embedding space with some example odors highlighted. Left: Each odor is clustered in its own space. Right: The hierarchical nature of the odor descriptor. Shaded and contoured areas are computed with a kernel-density estimate of the embeddings.

Future Work
Within the realm of machine learning, smell remains the most elusive of the senses, and we’re excited to continue doing a small part to shed light on it through further fundamental research. The possibilities for future research are numerous, and touch on everything from designing new olfactory molecules that are cheaper and more sustainably produced, to digitizing scent, or even one day giving those without a sense of smell access to roses (and, unfortunately, also rotten eggs). We hope to also bring this problem to the attention of more of the machine learning world through the eventual creation and sharing of high-quality, open datasets.

Acknowledgements
This early research is the result of the work and advisement of a team of talented researchers and engineers in Google Brain — Benjamin Sanchez-Lengeling, Jennifer Wei, Brian Lee, Emily Reif, Carey Radebaugh, Max Bileschi, Yoni Halpern, and D. Sculley. We are delighted to have collaborated on this work with Richard Gerkin at ASU and Alán Aspuru-Guzik at the University of Toronto. We are of course building on an enormous amount of prior work, and have benefitted particularly from work by Justin Gilmer, George Dahl and others on fundamental methodology in GNNs, among many other works in neuroscience, statistics and chemistry. We are also grateful to helpful comments from Steven Kearnes, David Belanger, Joel Mainland, and Emily Mayhew.

[D] Suggestion for multi-digit number recognition/OCR approach

Written on October 23, 2019. Posted in Reddit MachineLearning.

hello, im a final year student in college.

recently, ive been tasked to build a system that can recognize runners based on their bib numbers. I came up with an idea to detect the runner first using mask r-cnn and then using the masked area from the processed image to do the OCR for the bib numbers. Is there any suggestion for the best approach to do the OCR thing? thanks

submitted by /u/itsDitzy
[link] [comments]

[D] ML frameworks used at ICCV 20172019: PyTorch 3->253, Tensorflow 43->91, Caffe 108 -> 18

Written on October 23, 2019. Posted in Reddit MachineLearning.

Perhaps the most surprising fact here is that there are still 18 papers using Caffe (not Caffe2!) in 2019. Also, interestingly, all of the papers using Caffe are from Chinese universities.

submitted by /u/programmerChilli
[link] [comments]

[P] Implementing Neuro evolution to build Snake game AI

Written on October 23, 2019. Posted in Reddit MachineLearning.

I am trying to implement NEAT for the snake game. My game logic is ready, which is working properly and NEAT configured. But even after 100 generations with population size of 200 per generation, the snakes perform very poorly. I am using neat-python for this.

The game board is 300×300 with grid size of 15. Hence, food and each part of the snake is of size 15×15. Hence, STEP = 15 for snake movement. The neural network has 24 inputs and 4 outputs and no hidden layer as part of the initial NEAT configuration. Activation function used is sigmoid.

Below are the inputs:

snakeHeadX, snakeHeadY, snakeHeadBottomDist, snakeHeadRightDist, snakeTailX, snakeTailY, snakeLength, moveCount, moveToFood, food.x, food.y, foodBottomDist, foodRightDist, snakeFoodDistEuclidean, snakeFoodDistManhattan, viewDirections[0], viewDirections[1], viewDirections[2], viewDirections[3], viewDirections[4], viewDirections[5], viewDirections[6], viewDirections[7], deltaFoodDist

Here, viewDirections[0] – [7] denote what the snake finds looking in 8 different directions. In each direction, the snake will check for food and it’s own body. If it finds neither food nor body, value for that direction will be 0, if it finds only food, it will be 1, if finds only body, it will be 2 and if both body and food is found, then value will be 3. I have attached the implementation to find viewDirections list below as well.

The outputs are:

output[0] –> for moving up, output[1] –> for moving down, output[2] –> for moving left, output[3] –> for moving right

The problem is the snake barely ever eats more than 2 food. The snake is unable to learn where the food is, reduce distance to food and ultimately eat it, but avoiding wall and the body at the same time. Need help if anyone here can guide me with what I am doing wrong, or what I am missing that I need to incorporate in this to make it work.

Below is the eval_genome function:

ef main(genomes, config): clock = pygame.time.Clock() win = pygame.display.set_mode((WIN_WIDTH, WIN_HEIGHT)) for genome_id, g in genomes: net = neat.nn.FeedForwardNetwork.create(g, config) g.fitness = 0 snake = Snake() food = Food(snake.body) run = True UP = DOWN = RIGHT = LEFT = MOVE_SNAKE = False moveToFood = 0 score = 0 moveCount = 0 while run: clock.tick(90) for event in pygame.event.get(): if event.type == pygame.QUIT: run = False snakeHeadX = snake.body[0]['x'] snakeHeadY = snake.body[0]['y'] snakeTailX = snake.body[len(snake.body)-1]['x'] snakeTailY = snake.body[len(snake.body)-1]['y'] snakeLength = len(snake.body) snakeHeadBottomDist = WIN_HEIGHT - snakeHeadY - STEP snakeHeadRightDist = WIN_WIDTH - snakeHeadX - STEP foodBottomDist = WIN_HEIGHT - food.y - STEP foodRightDist = WIN_WIDTH - food.x - STEP snakeFoodDistEuclidean = math.sqrt((snakeHeadX - food.x)**2 + (snakeHeadY - food.y)**2) snakeFoodDistManhattan = abs(snakeHeadX - food.x) + abs(snakeHeadY - food.y) viewDirections = snake.checkDirections(food, UP, DOWN, LEFT, RIGHT) if not MOVE_SNAKE: deltaFoodDist = 0 outputs = net.activate((snakeHeadX, snakeHeadY, snakeHeadBottomDist, snakeHeadRightDist, snakeTailX, snakeTailY, snakeLength, moveCount, moveToFood, food.x, food.y, foodBottomDist, foodRightDist, snakeFoodDistEuclidean, snakeFoodDistManhattan, viewDirections[0], viewDirections[1], viewDirections[2], viewDirections[3], viewDirections[4], viewDirections[5], viewDirections[6], viewDirections[7], deltaFoodDist)) if (outputs[0] == max(outputs) and not DOWN): snake.setDir(0,-1) UP = True LEFT = False RIGHT = False MOVE_SNAKE = True elif (outputs[1] == max(outputs) and not UP): snake.setDir(0,1) DOWN = True LEFT = False RIGHT = False MOVE_SNAKE = True elif (outputs[2] == max(outputs) and not RIGHT): snake.setDir(-1,0) LEFT = True UP = False DOWN = False MOVE_SNAKE = True elif (outputs[3] == max(outputs) and not LEFT): snake.setDir(1,0) RIGHT = True UP = False DOWN = False MOVE_SNAKE = True elif (not MOVE_SNAKE): if (outputs[0] == max(outputs)): snake.setDir(0,-1) UP = True MOVE_SNAKE = True elif (outputs[1] == max(outputs)): snake.setDir(0,1) DOWN = True MOVE_SNAKE = True elif (outputs[2] == max(outputs)): snake.setDir(-1,0) LEFT = True MOVE_SNAKE = True elif (outputs[3] == max(outputs)): snake.setDir(1,0) RIGHT = True MOVE_SNAKE = True win.fill((0, 0, 0)) food.showFood(win) if(MOVE_SNAKE): snake.update() newSnakeHeadX = snake.body[0]['x'] newSnakeHeadY = snake.body[0]['y'] newFoodDist = math.sqrt((newSnakeHeadX - food.x)**2 + (newSnakeHeadY - food.y)**2) deltaFoodDist = newFoodDist - snakeFoodDistEuclidean moveCount += 1 g.fitness += 0.01 if (deltaFoodDist < 0): g.fitness += 5 else: g.fitness -= 50 if(snake.collision()): if score != 0: print('FINAL SCORE IS: '+ str(score)) g.fitness -= 300 break snake.show(win) if(snake.eat(food,win)): g.fitness += 15 score += 1 if score == 1 : moveToFood = moveCount else: moveToFood = moveCount - moveToFood food.foodLocation(snake.body) food.showFood(win)

Below is the checkDirections function implemented in Snake class which gives the viewDirections list as output:

def checkDirections(self, food, up, down, left, right): ''' x+STEP, y-STEP x+STEP, y+STEP x-STEP, y-STEP x-STEP, y+STEP x+STEP, y x, y-STEP x, y+STEP x-STEP, y ''' view = [] x = self.xdir y = self.ydir view.append(self.check(x, y, STEP, -STEP, food.x, food.y)) view.append(self.check(x, y, STEP, STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, -STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, STEP, food.x, food.y)) view.append(self.check(x, y, STEP, 0, food.x, food.y)) view.append(self.check(x, y, 0, -STEP, food.x, food.y)) view.append(self.check(x, y, 0, STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, 0, food.x, food.y)) if up == True: view[6] = -999 elif down == True: view[5] = -999 elif left == True: view[4] == -999 elif right == True: view[7] == -999 return view def check(self, x, y, xIncrement, yIncrement, foodX, foodY): value = 0 foodFound = False bodyFound = False while (x >= 0 and x < WIN_WIDTH and y >= 0 and y < WIN_HEIGHT): x += xIncrement y += yIncrement if (not foodFound): if (foodX == x and foodY == y): foodFound = True if (not bodyFound): for i in range(1, len(self.body)): if ((x == self.body[i]['x']) and (y == self.body[i]['y'])): bodyFound = True if (not bodyFound and not foodFound): value = 0 elif (not bodyFound and foodFound): value = 1 elif (bodyFound and not foodFound): value = 2 else: value = 3 return value

submitted by /u/deepLearner92
[link] [comments]

[D] Layer Complexity of Recurrent NNs in the Transformer Paper

Written on October 23, 2019. Posted in Reddit MachineLearning.

https://arxiv.org/pdf/1706.03762.pdf Table 1 of this paper says the layer complexity of self-attention NNs is N^2*d, which I understand.What I dont understand is the complexity of Recurrent NNs, which seems to be d^2*N. Does anyone know how this comes to be?

submitted by /u/MichaelStaniek
[link] [comments]

[P] JoeyNMT: Minimalist neural machine translation for newbies written in Pytorch

Written on October 23, 2019. Posted in Reddit MachineLearning.

Our paper describing JoeyNMT was recently accepted at EMNLP so we thought it would be a good time to present our project to a larger community. Originally starting as a way to introduce students to neural machine translation methods without having to explain the intricacies of state of the art systems, JoeyNMT has now been in use for the past year now within our research group as a baseline system that is easily hackable and expandable. It has also found use Indaba Deep Learning school in Kenya and is a core tool used in the masakhane.io project to train NMT on African Languages.

Right now we have implemented

RNNs (LSTM/GRU) and transformers for encoding and decoding
Multiple attention models (MLP, Dot, Multi-head, and bilinear)
character, word-level, and byte-pair encoded inputs
Greedy decoding and beam search

Baseline models are available for English->{German, Latvian, Afrikaans, Zulu, Xitsonga, Northern Sotho, Setswana, isiZulu}

We have a github, blog post, and paper for JoeyNMT. We’d love to have more contributors and cover more language pairs.

submitted by /u/statnlphd
[link] [comments]

Curing HIV…This is where you come in. [Research] [Project]

Written on October 23, 2019. Posted in Reddit MachineLearning.

I’m a viral immunologist at amfAR, The Foundation for AIDS Research. Our job is to cure HIV…. Which means we give money to scientists we think can help us achieve our goal. I’ve been working on an idea the past year to bring in data scientists to analyze existing HIV datasets to find predictors that could be useful in developing a cure. The idea has finally come to fruition in the form of this request for proposals.

I’d love your help to energize HIV cure research with the new data science approaches being developed in other fields. So if you are interested in $150K/year to analyze your heart out and help us find a cure, consider applying. If you need help finding an HIV cure researcher to partner with, message me.

submitted by /u/dr_ish
[link] [comments]

AI’s New Onramp: Meet the Data Science PC

Written on October 23, 2019. Posted in NVIDIA.

The trip to AI and big-data analytics is now just a click away. Starting today, three NVIDIA partners are selling online a new class of computers we call data science PCs.

The systems bundle the hardware and software data scientists need to hit an “on” button and start managing datasets and models to make AI predictions. Data science PCs tap NVIDIA TITAN RTX GPUs and RAPIDS software to deliver 3-6x speed-ups compared to CPU-only desktops.

Three experts in building high-end PCs — Digital Storm, Maingear and Puget Systems — are offering the products now. They’re targeting an expanding class of independent data scientists to help them achieve better results faster.

data science PC benchmark — A data science PC handled extract-transform-load (ETL) and XGBoost training on a dataset derived from New York City taxis, delivering end-to-end predictions in one-sixth the time of a CPU-only desktop.

Some of the world’s largest and most innovative organizations are already using GPU-accelerated servers and workstations to tackle their demanding data-science jobs.

For example, Walmart’s supermarket of the future that can compute in real time more than 1.6 terabytes of data generated per second using NVIDIA’s EGX platform. The Summit system at Oak Ridge National Laboratory can tap its 27,648 NVIDIA V100 Tensor Core GPUs to drive 3.3 exaflops of mixed-precision horsepower on AI tasks.

But data science isn’t just for large enterprises. Startups, researchers, students and enthusiasts are jumping into this burgeoning field. They’re contributing to the corporate momentum making the role of data scientist one of the fastest growing jobs in the U.S.

The data science PC aims to fuel this growing class of independent data science practitioners. The combination of powerful, pre-configured systems and a tested software stack can jumpstart their work.

The Speeds and Feeds

Under the hood, a data science PC includes one or two TITAN RTX GPUs, each with up to 24GB of memory. NVLink high-speed interconnect technology connects the two GPUs to tackle datasets that demand more GPU memory.

The systems can accommodate 48-128GB of main memory and storage options include drives that range up to 10TB.

Each data science PC will ship with Linux and RAPIDS, NVIDIA’s data science software stack, powered by its popular CUDA-X AI programming libraries.

NVIDIA RAPIDS eases the job of porting existing code for GPU acceleration. Its APIs are modeled after popular libraries used in data science. In many cases, it’s only necessary to change a few lines of code in order to tap the potential of GPU acceleration.

Here are some of the key elements of RAPIDS:

cuDF is a Python GPU data-frame library for loading, joining, aggregating, filtering and otherwise manipulating data. The API is designed to be similar to Pandas, so existing code easily maps to the GPU.

cuML accelerates popular machine learning algorithms, including XGBoost, PCA, K-means, k-Nearest Neighbors and more. It is closely aligned with sciKit-learn.

cuGraph is a library of graph algorithms, similar to NetworkX, that works with data stored in a GPU data frame.

An ecosystem of startups in Inception, NVIDIA virtual accelerator program for startups focused on AI and data science, provides applications and services that run on top of RAPIDS. They include companies, such as Graphistry and OmniSci, that offer big-data visualization tools.

Data scientists can also use NVIDIA’s data science developer forum to ask questions and learn more about data science on GPUs.

The data science PC is here, ready to propel you to an AI future. Learn more from our partners Digital Storm, Maingear and Puget Systems.

The post AI’s New Onramp: Meet the Data Science PC appeared first on The Official NVIDIA Blog.

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Author: torontoai

[D] Do CNNs understand semantic relationships between their classes?

[D] A Unifying Framework of Bilinear LSTMs

Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules

[D] Suggestion for multi-digit number recognition/OCR approach

[D] ML frameworks used at ICCV 20172019: PyTorch 3->253, Tensorflow 43->91, Caffe 108 -> 18

[P] Implementing Neuro evolution to build Snake game AI

[D] Layer Complexity of Recurrent NNs in the Transformer Paper

[P] JoeyNMT: Minimalist neural machine translation for newbies written in Pytorch

Curing HIV…This is where you come in. [Research] [Project]

AI’s New Onramp: Meet the Data Science PC

The Speeds and Feeds