Category: Reddit MachineLearning

[D] How do object detection algorithms and feature extractor networks work together for action detection?

Written on August 8, 2019. Posted in Reddit MachineLearning.

I’m talking about architectures such as AlexNet, Inception and object detectors like YOLO, SSD. I’ve read a bit online and I’m really confused how they work together.
Lets say I want to detect a specific object/person from a video and put a box around them with a label describing the state of that object/person. How would that work? What would be steps taken by the object detector and feature extractor? A workflow for this would be really helpful.

submitted by /u/LessTell
[link] [comments]

[D] Effect of Oversampling on classifiers, when combined with image transformations

Written on August 8, 2019. Posted in Reddit MachineLearning.

I am trying to understand the negative consequences of oversampling in the context of image classification. If I am using a decent amount amount of image transformations, I believe it will effectively be equivalent to SMOTE for tabular data, since I am not exactly repeating any image in a batch. Does the behaviour and test set accuracy of a classifier in any way depend on the actual class distribution in the train set and by oversampling am I doing any harm?

To take an example I was training a classifier on a dataset with 5 classes, having heavy class imbalance. To balance it out I oversampled the minority classes so that all classes have equal number of images. This caused a significant performance drop on the test set that I have, while cross validation performance was fairly high on the oversampled set. When analysing the class distributions I saw that for the original train set the distribution was: 1,3,2,4,5 with decreasing number of samples. The test set has a class distribution 3,1,2,4,5 but the predictions after training on oversampled data have distribution 4,3,2,5,1. Mathematically speaking, how can this behaviour of over predicting a less frequently occuring class be explained?

submitted by /u/Atom_101
[link] [comments]

[D] What papers should I know when it comes to text recognition with LSTM/GRU

Written on August 8, 2019. Posted in Reddit MachineLearning.

Is there maybe some survey paper that summarizes the different architectures that are used wiedly for word based text recognition/classification? Or can you recommend somethong or are there some must-reads? Thanks!

submitted by /u/jthat92
[link] [comments]

[P] Description of the tool for Machine Learning researchers/engineers.

Written on August 8, 2019. Posted in Reddit MachineLearning.

The main reasons for creating the service of Machine Learning models inheritance visualization are described in this article.

Other reasons are not so obvious, and will be described later.

https://arxiv.org/pdf/1908.01874.pdf

submitted by /u/postmachines
[link] [comments]

[D] What are the current SOTA architectures for NLP information extraction & question answering?

Written on August 8, 2019. Posted in Reddit MachineLearning.

Been primarily working in a different field of DL for a while, but got a project coming up related to NLP. I’ve done some research though the most frequent ones that seem to be showing up are GPT-2, BERT, and ELMo. However, I am under the impression that these are burying others that may be better suited for the task.

If it’s of relevance; my domain expertise is in medicine, and intend to use it for medical purposes.

submitted by /u/Naveos
[link] [comments]

[R] DoorGym: A Scalable Door Opening Environment And Baseline Agent

Written on August 8, 2019. Posted in Reddit MachineLearning.

submitted by /u/sensetime
[link] [comments]

[P] For NLP beginners, simple PyTorch implementation of Language Modeling

Written on August 8, 2019. Posted in Reddit MachineLearning.

A step-by-step tutorial on how to implement and adapt simple language model to Wikipedia text.

A pre-trained BERT, XLNET is publicly available ! But, for NLP beginners, It could be hard to use/adapt after full understanding. For them, I covered whole, end-to-end implementation process for language modeling, using recurrent network, we already know.

I hope that this repo can be a good solution for people who want to have their own language model 🙂

https://github.com/lyeoni/pretraining-for-language-understanding

submitted by /u/lyeoni
[link] [comments]

[D] Basic RNN predicting more than 1 timestep /w Keras (Python).

Written on August 8, 2019. Posted in Reddit MachineLearning.

I’ve been working with RNNs for a little while now but prior to dipping my toe in this area I’ve successfully implemented a few basic feed forward models into a production environment. I like to think I understand the premise posed by recurrent topology (GRU, LSTM, for instance). I’m struggling with the basic shape of the data and/or the proper parameters for my training data.

Here’s a basic example I’ve been playing with for many in, one out (omitting the fancy Keras utils that do automatic encoding / mapping):

The Data / imports

import numpy as np from keras.utils import to_categorical, plot_model from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential from keras.layers import Dense, Embedding, LSTM, GRU, Dropout, Flatten from keras import callbacks, regularizers, optimizers X = [ ["a", "b", "c"], ["b", "c", "d"], ["c", "d", "e"], ["d", "e", "f"], ["e", "f", "g"], ["f", "g", "h"], ["g", "h", "i"], ["h", "i", "j"], ] # Maps to each value observation X, 2 seq out y_b = [ ["d", "e"], ["e", "f"], ["f", "g"], ["g", "h"], ["h", "i"], ["i", "j"], ["j", "k"], ["k", "m"] ] # Maps to each value observation X, 1 seq out y_a = [ ["d"], ["e"], ["f"], ["g"], ["h"], ["i"], ["j"], ["k"] ] # Basic function to translate characters to their ordinal offsets decode = lambda char: [chr(i) for i in range(97, 97 + 26)] encode = lambda seq: np.array([[letters.index(i) for i in obs] for obs in seq]) X_encoded = encode(X) y_encoded = encode(y_a) # y_a == predict single timestep, y_b == predict 2 timesteps

The resulting design matrix should look something like this:

array([[0, 1, 2], [1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8], [7, 8, 9]])

Reshaping for network

sequence_length = 3 X_reshaped = np.reshape(X_encoded, (len(X_encoded), sequence_length, 1)) X_reshaped = to_categorical(X_reshaped) # This is from keras.util y_cat = to_categorical(y_encoded)

y_cat ends up one-hot-encoded / binary-like for 1 representing the unique entity on index:

sequence_length = 3 X_reshaped = np.reshape(X_encoded, (len(X_encoded), sequence_length, 1)) ## Experimented with ## X_reshaped = to_categorical(X_reshaped) y_cat = to_categorical(y_encoded)

The Model

## Network topology model = Sequential() model.add(GRU(64, input_shape = (X_reshaped.shape[1], X_reshaped.shape[2]))) ## This doesn't seem to be necessary # model.add(Flatten()) ## This, I believe sets the assumption about the output in terms of categorical encoding model.add(Dense(y_cat.shape[1], activation="softmax")) rmsprop = optimizers.rmsprop(lr = .1) model.compile(loss="categorical_crossentropy", optimizer=rmsprop, metrics=["accuracy"]) model_params = dict( x = X_reshaped, y = y_cat, epochs = 50, batch_size = 2, verbose = 1, # callbacks = [keras_tensorboard], validation_split = 0.3 ) history = model.fit(**model_params)

My basic 3 in 1 out (predicting y_a) network works just fine. When I try to predict more than one (y_b), updating my parameters for the 2nd Dense layer, is when I run into problems. Given the shape and the assumptions I’ve made about the network, seem to be incorrect because the library throws an error about the shape of my ground truth (y).

Is this the proper topology for this type of problem?
Is my y encoded improperly?

Of course I’m interested in solving for the multi sequence output but more importantly, I’m hoping to understand “why” more than “how”. Thanks in advance for any advice or help!

submitted by /u/butter-jesus
[link] [comments]

[D] Has anyone made a deep q network that plays games by using ONLY the screen as input?

Written on August 8, 2019. Posted in Reddit MachineLearning.

I’ve seen multiple projects where people have claimed to use only the screen, but it uses some form of game that they made (or a gym environment) that obviously will work with the network. Has anyone literally used only the pixels of the screen as the input into the network?

submitted by /u/YuhFRthoYORKonhisass
[link] [comments]

[N] Video Understanding Using Temporal Cycle-Consistency Learning

Written on August 7, 2019. Posted in Reddit MachineLearning.

Blog post here. Excerpt from the blog:

We propose a potential solution using a self-supervised learning method called Temporal Cycle-Consistency Learning (TCC). This novel approach uses correspondences between examples of similar sequential processes to learn representations particularly well-suited for fine-grained temporal understanding of videos. We are also releasing our TCC codebase to enable end-users to apply our self-supervised learning algorithm to new and novel applications.

submitted by /u/__arch__
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

Category: Reddit MachineLearning

[D] How do object detection algorithms and feature extractor networks work together for action detection?

[D] Effect of Oversampling on classifiers, when combined with image transformations

[D] What papers should I know when it comes to text recognition with LSTM/GRU

[P] Description of the tool for Machine Learning researchers/engineers.

[D] What are the current SOTA architectures for NLP information extraction & question answering?

[R] DoorGym: A Scalable Door Opening Environment And Baseline Agent

[P] For NLP beginners, simple PyTorch implementation of Language Modeling

[D] Basic RNN predicting more than 1 timestep /w Keras (Python).

The Data / imports

Reshaping for network

The Model

[D] Has anyone made a deep q network that plays games by using ONLY the screen as input?

[N] Video Understanding Using Temporal Cycle-Consistency Learning