Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

Category: Reddit MachineLearning

[D] Suggestion for multi-digit number recognition/OCR approach

hello, im a final year student in college.

recently, ive been tasked to build a system that can recognize runners based on their bib numbers. I came up with an idea to detect the runner first using mask r-cnn and then using the masked area from the processed image to do the OCR for the bib numbers. Is there any suggestion for the best approach to do the OCR thing? thanks

submitted by /u/itsDitzy
[link] [comments]

[P] Implementing Neuro evolution to build Snake game AI

I am trying to implement NEAT for the snake game. My game logic is ready, which is working properly and NEAT configured. But even after 100 generations with population size of 200 per generation, the snakes perform very poorly. I am using neat-python for this.

The game board is 300×300 with grid size of 15. Hence, food and each part of the snake is of size 15×15. Hence, STEP = 15 for snake movement. The neural network has 24 inputs and 4 outputs and no hidden layer as part of the initial NEAT configuration. Activation function used is sigmoid.

Below are the inputs:

snakeHeadX, snakeHeadY, snakeHeadBottomDist, snakeHeadRightDist, snakeTailX, snakeTailY, snakeLength, moveCount, moveToFood, food.x, food.y, foodBottomDist, foodRightDist, snakeFoodDistEuclidean, snakeFoodDistManhattan, viewDirections[0], viewDirections[1], viewDirections[2], viewDirections[3], viewDirections[4], viewDirections[5], viewDirections[6], viewDirections[7], deltaFoodDist

Here, viewDirections[0] – [7] denote what the snake finds looking in 8 different directions. In each direction, the snake will check for food and it’s own body. If it finds neither food nor body, value for that direction will be 0, if it finds only food, it will be 1, if finds only body, it will be 2 and if both body and food is found, then value will be 3. I have attached the implementation to find viewDirections list below as well.

The outputs are:

output[0] –> for moving up, output[1] –> for moving down, output[2] –> for moving left, output[3] –> for moving right

The problem is the snake barely ever eats more than 2 food. The snake is unable to learn where the food is, reduce distance to food and ultimately eat it, but avoiding wall and the body at the same time. Need help if anyone here can guide me with what I am doing wrong, or what I am missing that I need to incorporate in this to make it work.

Below is the eval_genome function:

ef main(genomes, config): clock = pygame.time.Clock() win = pygame.display.set_mode((WIN_WIDTH, WIN_HEIGHT)) for genome_id, g in genomes: net = neat.nn.FeedForwardNetwork.create(g, config) g.fitness = 0 snake = Snake() food = Food(snake.body) run = True UP = DOWN = RIGHT = LEFT = MOVE_SNAKE = False moveToFood = 0 score = 0 moveCount = 0 while run: clock.tick(90) for event in pygame.event.get(): if event.type == pygame.QUIT: run = False snakeHeadX = snake.body[0]['x'] snakeHeadY = snake.body[0]['y'] snakeTailX = snake.body[len(snake.body)-1]['x'] snakeTailY = snake.body[len(snake.body)-1]['y'] snakeLength = len(snake.body) snakeHeadBottomDist = WIN_HEIGHT - snakeHeadY - STEP snakeHeadRightDist = WIN_WIDTH - snakeHeadX - STEP foodBottomDist = WIN_HEIGHT - food.y - STEP foodRightDist = WIN_WIDTH - food.x - STEP snakeFoodDistEuclidean = math.sqrt((snakeHeadX - food.x)**2 + (snakeHeadY - food.y)**2) snakeFoodDistManhattan = abs(snakeHeadX - food.x) + abs(snakeHeadY - food.y) viewDirections = snake.checkDirections(food, UP, DOWN, LEFT, RIGHT) if not MOVE_SNAKE: deltaFoodDist = 0 outputs = net.activate((snakeHeadX, snakeHeadY, snakeHeadBottomDist, snakeHeadRightDist, snakeTailX, snakeTailY, snakeLength, moveCount, moveToFood, food.x, food.y, foodBottomDist, foodRightDist, snakeFoodDistEuclidean, snakeFoodDistManhattan, viewDirections[0], viewDirections[1], viewDirections[2], viewDirections[3], viewDirections[4], viewDirections[5], viewDirections[6], viewDirections[7], deltaFoodDist)) if (outputs[0] == max(outputs) and not DOWN): snake.setDir(0,-1) UP = True LEFT = False RIGHT = False MOVE_SNAKE = True elif (outputs[1] == max(outputs) and not UP): snake.setDir(0,1) DOWN = True LEFT = False RIGHT = False MOVE_SNAKE = True elif (outputs[2] == max(outputs) and not RIGHT): snake.setDir(-1,0) LEFT = True UP = False DOWN = False MOVE_SNAKE = True elif (outputs[3] == max(outputs) and not LEFT): snake.setDir(1,0) RIGHT = True UP = False DOWN = False MOVE_SNAKE = True elif (not MOVE_SNAKE): if (outputs[0] == max(outputs)): snake.setDir(0,-1) UP = True MOVE_SNAKE = True elif (outputs[1] == max(outputs)): snake.setDir(0,1) DOWN = True MOVE_SNAKE = True elif (outputs[2] == max(outputs)): snake.setDir(-1,0) LEFT = True MOVE_SNAKE = True elif (outputs[3] == max(outputs)): snake.setDir(1,0) RIGHT = True MOVE_SNAKE = True win.fill((0, 0, 0)) food.showFood(win) if(MOVE_SNAKE): snake.update() newSnakeHeadX = snake.body[0]['x'] newSnakeHeadY = snake.body[0]['y'] newFoodDist = math.sqrt((newSnakeHeadX - food.x)**2 + (newSnakeHeadY - food.y)**2) deltaFoodDist = newFoodDist - snakeFoodDistEuclidean moveCount += 1 g.fitness += 0.01 if (deltaFoodDist < 0): g.fitness += 5 else: g.fitness -= 50 if(snake.collision()): if score != 0: print('FINAL SCORE IS: '+ str(score)) g.fitness -= 300 break snake.show(win) if(snake.eat(food,win)): g.fitness += 15 score += 1 if score == 1 : moveToFood = moveCount else: moveToFood = moveCount - moveToFood food.foodLocation(snake.body) food.showFood(win) 

Below is the checkDirections function implemented in Snake class which gives the viewDirections list as output:

def checkDirections(self, food, up, down, left, right): ''' x+STEP, y-STEP x+STEP, y+STEP x-STEP, y-STEP x-STEP, y+STEP x+STEP, y x, y-STEP x, y+STEP x-STEP, y ''' view = [] x = self.xdir y = self.ydir view.append(self.check(x, y, STEP, -STEP, food.x, food.y)) view.append(self.check(x, y, STEP, STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, -STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, STEP, food.x, food.y)) view.append(self.check(x, y, STEP, 0, food.x, food.y)) view.append(self.check(x, y, 0, -STEP, food.x, food.y)) view.append(self.check(x, y, 0, STEP, food.x, food.y)) view.append(self.check(x, y, -STEP, 0, food.x, food.y)) if up == True: view[6] = -999 elif down == True: view[5] = -999 elif left == True: view[4] == -999 elif right == True: view[7] == -999 return view def check(self, x, y, xIncrement, yIncrement, foodX, foodY): value = 0 foodFound = False bodyFound = False while (x >= 0 and x < WIN_WIDTH and y >= 0 and y < WIN_HEIGHT): x += xIncrement y += yIncrement if (not foodFound): if (foodX == x and foodY == y): foodFound = True if (not bodyFound): for i in range(1, len(self.body)): if ((x == self.body[i]['x']) and (y == self.body[i]['y'])): bodyFound = True if (not bodyFound and not foodFound): value = 0 elif (not bodyFound and foodFound): value = 1 elif (bodyFound and not foodFound): value = 2 else: value = 3 return value 

submitted by /u/deepLearner92
[link] [comments]

[P] JoeyNMT: Minimalist neural machine translation for newbies written in Pytorch

Our paper describing JoeyNMT was recently accepted at EMNLP so we thought it would be a good time to present our project to a larger community. Originally starting as a way to introduce students to neural machine translation methods without having to explain the intricacies of state of the art systems, JoeyNMT has now been in use for the past year now within our research group as a baseline system that is easily hackable and expandable. It has also found use Indaba Deep Learning school in Kenya and is a core tool used in the masakhane.io project to train NMT on African Languages.

Right now we have implemented

  • RNNs (LSTM/GRU) and transformers for encoding and decoding
  • Multiple attention models (MLP, Dot, Multi-head, and bilinear)
  • character, word-level, and byte-pair encoded inputs
  • Greedy decoding and beam search

Baseline models are available for English->{German, Latvian, Afrikaans, Zulu, Xitsonga, Northern Sotho, Setswana, isiZulu}

We have a github, blog post, and paper for JoeyNMT. We’d love to have more contributors and cover more language pairs.

submitted by /u/statnlphd
[link] [comments]

Curing HIV…This is where you come in. [Research] [Project]

I’m a viral immunologist at amfAR, The Foundation for AIDS Research. Our job is to cure HIV…. Which means we give money to scientists we think can help us achieve our goal. I’ve been working on an idea the past year to bring in data scientists to analyze existing HIV datasets to find predictors that could be useful in developing a cure. The idea has finally come to fruition in the form of this request for proposals.

I’d love your help to energize HIV cure research with the new data science approaches being developed in other fields. So if you are interested in $150K/year to analyze your heart out and help us find a cure, consider applying. If you need help finding an HIV cure researcher to partner with, message me.

submitted by /u/dr_ish
[link] [comments]

[D] Kernel functions and neural networks

I’ve been pondering this question and wanted to get some of your thoughts on it.

Kernel functions finds distances between two inputs relative to each other in some transformed space. Neural networks on the other hand finds the exact location of of the input in its transformed space. Are there benefit and downsides between the two transformations? Why are kernel functions used instead of specifying the direct transformation from input to transformed space

submitted by /u/dramanautica
[link] [comments]

[P] MelGAN vocoder implementation in PyTorch

[P] MelGAN vocoder implementation in PyTorch

Disclaimer: This is a third-party implementation. The original authors stated that they will be releasing code soon.

A recent research showed that fully-convolutional GAN called MelGAN can invert mel-spectrogram into raw audio in non-autoregressive manner. They showed that their MelGAN is lighter & faster than WaveGlow, and even can generalize to unseen speakers when trained on 3 male + 3 female speakers’ speech.

I thought this is a major breakthrough in TTS reserach, since both researchers and engineers can benefit from this fast & lightweight neural vocoder. So I’ve tried to implement this is PyTorch: see GitHub link w/ audio samples below.

Debugging was quite painful while implementing this. Changing the update order of G/D mattered much, and my generator’s loss curve is still going up. (Though results looks good when compared to original paper’s.)

Figure 1 from “MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis”

submitted by /u/seungwonpark
[link] [comments]

[D] Overfitting vs. Generalization – a subtle difference

In my view, overfitting does not necessarily imply lack of generalization, just as well as generalization cannot be directly associated to degree of overfitting.

An overfit model is a model that is tuned to generate the highest performance (e.g. lowest loss) on the dataset it was trained with. This can be tested by the difference between the losses on the validation set and on the training set. In order to test for overfitting, training and validation sets should have similar distributions. If that’s the case, an overfit model will deviate in performance on the validation set from the training performance. This is because, even if the distributions are similar, the model is tuned to pick up correctly only the samples it has seen on the training set.

As for generalization, it can only be evaluated between datasets (test and training) that have different distributions. Ideally, the test distribution will be the most heterogeneous of them all. In my opinion, this is the only way to really assess generalization: the difference between the losses on training versus testing set.

TLDR: Overfitting is indicated by when model underperforms on unseen data with similar distributions to seen data. Generalization, on the other hand, is indicated by the performance differences between seen and unseen data with different distributions, where the unseen data ideally represents real world distributions.

I think this is a misconception most have, even in industry.

What are your thoughts?

submitted by /u/eigenlaplace
[link] [comments]