Category: Reddit MachineLearning

[D] Suggestion for multi-digit number recognition/OCR approach

Written on October 23, 2019. Posted in Reddit MachineLearning.

hello, im a final year student in college.

recently, ive been tasked to build a system that can recognize runners based on their bib numbers. I came up with an idea to detect the runner first using mask r-cnn and then using the masked area from the processed image to do the OCR for the bib numbers. Is there any suggestion for the best approach to do the OCR thing? thanks

submitted by /u/itsDitzy
[link] [comments]

[D] ML frameworks used at ICCV 20172019: PyTorch 3->253, Tensorflow 43->91, Caffe 108 -> 18

Written on October 23, 2019. Posted in Reddit MachineLearning.

Perhaps the most surprising fact here is that there are still 18 papers using Caffe (not Caffe2!) in 2019. Also, interestingly, all of the papers using Caffe are from Chinese universities.

submitted by /u/programmerChilli
[link] [comments]

[P] Implementing Neuro evolution to build Snake game AI

Written on October 23, 2019. Posted in Reddit MachineLearning.

I am trying to implement NEAT for the snake game. My game logic is ready, which is working properly and NEAT configured. But even after 100 generations with population size of 200 per generation, the snakes perform very poorly. I am using neat-python for this.

The game board is 300×300 with grid size of 15. Hence, food and each part of the snake is of size 15×15. Hence, STEP = 15 for snake movement. The neural network has 24 inputs and 4 outputs and no hidden layer as part of the initial NEAT configuration. Activation function used is sigmoid.

Below are the inputs:

snakeHeadX, snakeHeadY, snakeHeadBottomDist, snakeHeadRightDist, snakeTailX, snakeTailY, snakeLength, moveCount, moveToFood, food.x, food.y, foodBottomDist, foodRightDist, snakeFoodDistEuclidean, snakeFoodDistManhattan, viewDirections[0], viewDirections[1], viewDirections[2], viewDirections[3], viewDirections[4], viewDirections[5], viewDirections[6], viewDirections[7], deltaFoodDist

Here, viewDirections[0] – [7] denote what the snake finds looking in 8 different directions. In each direction, the snake will check for food and it’s own body. If it finds neither food nor body, value for that direction will be 0, if it finds only food, it will be 1, if finds only body, it will be 2 and if both body and food is found, then value will be 3. I have attached the implementation to find viewDirections list below as well.

The outputs are:

output[0] –> for moving up, output[1] –> for moving down, output[2] –> for moving left, output[3] –> for moving right

The problem is the snake barely ever eats more than 2 food. The snake is unable to learn where the food is, reduce distance to food and ultimately eat it, but avoiding wall and the body at the same time. Need help if anyone here can guide me with what I am doing wrong, or what I am missing that I need to incorporate in this to make it work.

Below is the eval_genome function:

ef main(genomes, config): clock = pygame.time.Clock() win = pygame.display.set_mode((WIN_WIDTH, WIN_HEIGHT)) for genome_id, g in genomes: net = neat.nn.FeedForwardNetwork.create(g, config) g.fitness = 0 snake = Snake() food = Food(snake.body) run = True UP = DOWN = RIGHT = LEFT = MOVE_SNAKE = False moveToFood = 0 score = 0 moveCount = 0 while run: clock.tick(90) for event in pygame.event.get(): if event.type == pygame.QUIT: run = False snakeHeadX = snake.body[0]['x'] snakeHeadY = snake.body[0]['y'] snakeTailX = snake.body[len(snake.body)-1]['x'] snakeTailY = snake.body[len(snake.body)-1]['y'] snakeLength = len(snake.body) snakeHeadBottomDist = WIN_HEIGHT - snakeHeadY - STEP snakeHeadRightDist = WIN_WIDTH - snakeHeadX - STEP foodBottomDist = WIN_HEIGHT - food.y - STEP foodRightDist = WIN_WIDTH - food.x - STEP snakeFoodDistEuclidean = math.sqrt((snakeHeadX - food.x)**2 + (snakeHeadY - food.y)**2) snakeFoodDistManhattan = abs(snakeHeadX - food.x) + abs(snakeHeadY - food.y) viewDirections = snake.checkDirections(food, UP, DOWN, LEFT, RIGHT) if not MOVE_SNAKE: deltaFoodDist = 0 outputs = net.activate((snakeHeadX, snakeHeadY, snakeHeadBottomDist, snakeHeadRightDist, snakeTailX, snakeTailY, snakeLength, moveCount, moveToFood, food.x, food.y, foodBottomDist, foodRightDist, snakeFoodDistEuclidean, snakeFoodDistManhattan, viewDirections[0], viewDirections[1], viewDirections[2], viewDirections[3], viewDirections[4], viewDirections[5], viewDirections[6], viewDirections[7], deltaFoodDist)) if (outputs[0] == max(outputs) and not DOWN): snake.setDir(0,-1) UP = True LEFT = False RIGHT = False MOVE_SNAKE = True elif (outputs[1] == max(outputs) and not UP): snake.setDir(0,1) DOWN = True LEFT = False RIGHT = False MOVE_SNAKE = True elif (outputs[2] == max(outputs) and not RIGHT): snake.setDir(-1,0) LEFT = True UP = False DOWN = False MOVE_SNAKE = True elif (outputs[3] == max(outputs) and not LEFT): snake.setDir(1,0) RIGHT = True UP = False DOWN = False MOVE_SNAKE = True elif (not MOVE_SNAKE): if (outputs[0] == max(outputs)): snake.setDir(0,-1) UP = True MOVE_SNAKE = True elif (outputs[1] == max(outputs)): snake.setDir(0,1) DOWN = True MOVE_SNAKE = True elif (outputs[2] == max(outputs)): snake.setDir(-1,0) LEFT = True MOVE_SNAKE = True elif (outputs[3] == max(outputs)): snake.setDir(1,0) RIGHT = True MOVE_SNAKE = True win.fill((0, 0, 0)) food.showFood(win) if(MOVE_SNAKE): snake.update() newSnakeHeadX = snake.body[0]['x'] newSnakeHeadY = snake.body[0]['y'] newFoodDist = math.sqrt((newSnakeHeadX - food.x)**2 + (newSnakeHeadY - food.y)**2) deltaFoodDist = newFoodDist - snakeFoodDistEuclidean moveCount += 1 g.fitness += 0.01 if (deltaFoodDist < 0): g.fitness += 5 else: g.fitness -= 50 if(snake.collision()): if score != 0: print('FINAL SCORE IS: '+ str(score)) g.fitness -= 300 break snake.show(win) if(snake.eat(food,win)): g.fitness += 15 score += 1 if score == 1 : moveToFood = moveCount else: moveToFood = moveCount - moveToFood food.foodLocation(snake.body) food.showFood(win)

Below is the checkDirections function implemented in Snake class which gives the viewDirections list as output:

def checkDirections(self, food, up, down, left, right): ''' x+STEP, y-STEP x+STEP, y+STEP x-STEP, y-STEP x-STEP, y+STEP x+STEP, y x, y-STEP x, y+STEP x-STEP, y ''' view = [] x = self.xdir y = self.ydir view.append(self.check(x, y, STEP, -STEP, food.x, food.y)) view.append(self.check(x, y, STEP, STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, -STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, STEP, food.x, food.y)) view.append(self.check(x, y, STEP, 0, food.x, food.y)) view.append(self.check(x, y, 0, -STEP, food.x, food.y)) view.append(self.check(x, y, 0, STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, 0, food.x, food.y)) if up == True: view[6] = -999 elif down == True: view[5] = -999 elif left == True: view[4] == -999 elif right == True: view[7] == -999 return view def check(self, x, y, xIncrement, yIncrement, foodX, foodY): value = 0 foodFound = False bodyFound = False while (x >= 0 and x < WIN_WIDTH and y >= 0 and y < WIN_HEIGHT): x += xIncrement y += yIncrement if (not foodFound): if (foodX == x and foodY == y): foodFound = True if (not bodyFound): for i in range(1, len(self.body)): if ((x == self.body[i]['x']) and (y == self.body[i]['y'])): bodyFound = True if (not bodyFound and not foodFound): value = 0 elif (not bodyFound and foodFound): value = 1 elif (bodyFound and not foodFound): value = 2 else: value = 3 return value

submitted by /u/deepLearner92
[link] [comments]

[D] Layer Complexity of Recurrent NNs in the Transformer Paper

Written on October 23, 2019. Posted in Reddit MachineLearning.

https://arxiv.org/pdf/1706.03762.pdf Table 1 of this paper says the layer complexity of self-attention NNs is N^2*d, which I understand.What I dont understand is the complexity of Recurrent NNs, which seems to be d^2*N. Does anyone know how this comes to be?

submitted by /u/MichaelStaniek
[link] [comments]

[P] JoeyNMT: Minimalist neural machine translation for newbies written in Pytorch

Written on October 23, 2019. Posted in Reddit MachineLearning.

Our paper describing JoeyNMT was recently accepted at EMNLP so we thought it would be a good time to present our project to a larger community. Originally starting as a way to introduce students to neural machine translation methods without having to explain the intricacies of state of the art systems, JoeyNMT has now been in use for the past year now within our research group as a baseline system that is easily hackable and expandable. It has also found use Indaba Deep Learning school in Kenya and is a core tool used in the masakhane.io project to train NMT on African Languages.

Right now we have implemented

RNNs (LSTM/GRU) and transformers for encoding and decoding
Multiple attention models (MLP, Dot, Multi-head, and bilinear)
character, word-level, and byte-pair encoded inputs
Greedy decoding and beam search

Baseline models are available for English->{German, Latvian, Afrikaans, Zulu, Xitsonga, Northern Sotho, Setswana, isiZulu}

We have a github, blog post, and paper for JoeyNMT. We’d love to have more contributors and cover more language pairs.

submitted by /u/statnlphd
[link] [comments]

Curing HIV…This is where you come in. [Research] [Project]

Written on October 23, 2019. Posted in Reddit MachineLearning.

I’m a viral immunologist at amfAR, The Foundation for AIDS Research. Our job is to cure HIV…. Which means we give money to scientists we think can help us achieve our goal. I’ve been working on an idea the past year to bring in data scientists to analyze existing HIV datasets to find predictors that could be useful in developing a cure. The idea has finally come to fruition in the form of this request for proposals.

I’d love your help to energize HIV cure research with the new data science approaches being developed in other fields. So if you are interested in $150K/year to analyze your heart out and help us find a cure, consider applying. If you need help finding an HIV cure researcher to partner with, message me.

submitted by /u/dr_ish
[link] [comments]

[D] Kernel functions and neural networks

Written on October 23, 2019. Posted in Reddit MachineLearning.

I’ve been pondering this question and wanted to get some of your thoughts on it.

Kernel functions finds distances between two inputs relative to each other in some transformed space. Neural networks on the other hand finds the exact location of of the input in its transformed space. Are there benefit and downsides between the two transformations? Why are kernel functions used instead of specifying the direct transformation from input to transformed space

submitted by /u/dramanautica
[link] [comments]

[P] MelGAN vocoder implementation in PyTorch

Written on October 23, 2019. Posted in Reddit MachineLearning.

Disclaimer: This is a third-party implementation. The original authors stated that they will be releasing code soon.

A recent research showed that fully-convolutional GAN called MelGAN can invert mel-spectrogram into raw audio in non-autoregressive manner. They showed that their MelGAN is lighter & faster than WaveGlow, and even can generalize to unseen speakers when trained on 3 male + 3 female speakers’ speech.

I thought this is a major breakthrough in TTS reserach, since both researchers and engineers can benefit from this fast & lightweight neural vocoder. So I’ve tried to implement this is PyTorch: see GitHub link w/ audio samples below.

Debugging was quite painful while implementing this. Changing the update order of G/D mattered much, and my generator’s loss curve is still going up. (Though results looks good when compared to original paper’s.)

original paper: https://arxiv.org/abs/1910.06711
implementation: https://github.com/seungwonpark/melgan
audio samples: http://swpark.me/melgan/
audio samples from original paper: https://melgan-neurips.github.io

Figure 1 from “MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis”

submitted by /u/seungwonpark
[link] [comments]

[R] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Written on October 23, 2019. Posted in Reddit MachineLearning.

submitted by /u/hardmaru
[link] [comments]

[D] Overfitting vs. Generalization – a subtle difference

Written on October 22, 2019. Posted in Reddit MachineLearning.

In my view, overfitting does not necessarily imply lack of generalization, just as well as generalization cannot be directly associated to degree of overfitting.

An overfit model is a model that is tuned to generate the highest performance (e.g. lowest loss) on the dataset it was trained with. This can be tested by the difference between the losses on the validation set and on the training set. In order to test for overfitting, training and validation sets should have similar distributions. If that’s the case, an overfit model will deviate in performance on the validation set from the training performance. This is because, even if the distributions are similar, the model is tuned to pick up correctly only the samples it has seen on the training set.

As for generalization, it can only be evaluated between datasets (test and training) that have different distributions. Ideally, the test distribution will be the most heterogeneous of them all. In my opinion, this is the only way to really assess generalization: the difference between the losses on training versus testing set.

TLDR: Overfitting is indicated by when model underperforms on unseen data with similar distributions to seen data. Generalization, on the other hand, is indicated by the performance differences between seen and unseen data with different distributions, where the unseen data ideally represents real world distributions.

I think this is a misconception most have, even in industry.

What are your thoughts?

submitted by /u/eigenlaplace
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[D] Suggestion for multi-digit number recognition/OCR approach

[D] ML frameworks used at ICCV 20172019: PyTorch 3->253, Tensorflow 43->91, Caffe 108 -> 18

[P] Implementing Neuro evolution to build Snake game AI

[D] Layer Complexity of Recurrent NNs in the Transformer Paper

[P] JoeyNMT: Minimalist neural machine translation for newbies written in Pytorch

Curing HIV…This is where you come in. [Research] [Project]

[D] Kernel functions and neural networks

[P] MelGAN vocoder implementation in PyTorch

[R] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

[D] Overfitting vs. Generalization – a subtle difference