[D] Basic RNN predicting more than 1 timestep /w Keras (Python).
I’ve been working with RNNs for a little while now but prior to dipping my toe in this area I’ve successfully implemented a few basic feed forward models into a production environment. I like to think I understand the premise posed by recurrent topology (GRU, LSTM, for instance). I’m struggling with the basic shape of the data and/or the proper parameters for my training data.
Here’s a basic example I’ve been playing with for many in, one out (omitting the fancy Keras utils that do automatic encoding / mapping):
The Data / imports
import numpy as np from keras.utils import to_categorical, plot_model from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.models import Sequential from keras.layers import Dense, Embedding, LSTM, GRU, Dropout, Flatten from keras import callbacks, regularizers, optimizers X = [ ["a", "b", "c"], ["b", "c", "d"], ["c", "d", "e"], ["d", "e", "f"], ["e", "f", "g"], ["f", "g", "h"], ["g", "h", "i"], ["h", "i", "j"], ] # Maps to each value observation X, 2 seq out y_b = [ ["d", "e"], ["e", "f"], ["f", "g"], ["g", "h"], ["h", "i"], ["i", "j"], ["j", "k"], ["k", "m"] ] # Maps to each value observation X, 1 seq out y_a = [ ["d"], ["e"], ["f"], ["g"], ["h"], ["i"], ["j"], ["k"] ] # Basic function to translate characters to their ordinal offsets decode = lambda char: [chr(i) for i in range(97, 97 + 26)] encode = lambda seq: np.array([[letters.index(i) for i in obs] for obs in seq]) X_encoded = encode(X) y_encoded = encode(y_a) # y_a == predict single timestep, y_b == predict 2 timesteps
The resulting design matrix should look something like this:
array([[0, 1, 2], [1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6], [5, 6, 7], [6, 7, 8], [7, 8, 9]])
Reshaping for network
sequence_length = 3 X_reshaped = np.reshape(X_encoded, (len(X_encoded), sequence_length, 1)) X_reshaped = to_categorical(X_reshaped) # This is from keras.util y_cat = to_categorical(y_encoded)
y_cat ends up one-hot-encoded / binary-like for 1 representing the unique entity on index:
sequence_length = 3 X_reshaped = np.reshape(X_encoded, (len(X_encoded), sequence_length, 1)) ## Experimented with ## X_reshaped = to_categorical(X_reshaped) y_cat = to_categorical(y_encoded)
The Model
## Network topology model = Sequential() model.add(GRU(64, input_shape = (X_reshaped.shape[1], X_reshaped.shape[2]))) ## This doesn't seem to be necessary # model.add(Flatten()) ## This, I believe sets the assumption about the output in terms of categorical encoding model.add(Dense(y_cat.shape[1], activation="softmax")) rmsprop = optimizers.rmsprop(lr = .1) model.compile(loss="categorical_crossentropy", optimizer=rmsprop, metrics=["accuracy"]) model_params = dict( x = X_reshaped, y = y_cat, epochs = 50, batch_size = 2, verbose = 1, # callbacks = [keras_tensorboard], validation_split = 0.3 ) history = model.fit(**model_params)
My basic 3 in 1 out (predicting y_a) network works just fine. When I try to predict more than one (y_b
), updating my parameters for the 2nd Dense
layer, is when I run into problems. Given the shape and the assumptions I’ve made about the network, seem to be incorrect because the library throws an error about the shape of my ground truth (y).
- Is this the proper topology for this type of problem?
- Is my
y
encoded improperly?
Of course I’m interested in solving for the multi sequence output but more importantly, I’m hoping to understand “why” more than “how”. Thanks in advance for any advice or help!
submitted by /u/butter-jesus
[link] [comments]