Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[P] RNNs and Reinforcement learning

Keep in mind that I am still in the early phase of learning ML. I cannot disclose the exact task/problem I am working on (not related to NLP), but the below task captures the essence of it.

(REINFORCEMENT LEARNING)

Input – A paragraph written in English.

Output – On a scale of 1-10 (continuous/not discrete scale) predict the level of English of each sentence. For example, My English is poor should score better than I have bad English. Even though both are grammatically correct.

Example input: Hey! how are you? It has been so long since I last saw you.

Example output: [5.544554, 5.890909] (made up numbers)

My approach:

  1. Encode each word. (fixed length if that matters)
  2. Break paragraph into sentences, because prediction for a sentence will not depend upon other sentences.
  3. Pad every sentence so that they have same length.
  4. For every sentence:

i. Pass each word of the sentence to a RNN encoder, And extract the hidden state corresponding to the last word (before padding). (for example: Sentence: i am fine padded_word padded_word, RNN output: [A,B,C,D,E] so I extract RNN output/hidden_state C. (not sure if this is the right thing to do)

j. Pass this hidden state C to RNN decoder, which makes the prediction. This prediction leads to a reward.

  1. Use PPO (proximal policy optimization).

I hope this is clear and am sorry for being so vague about my problem. If it matters, I have a few fully connected layers between encoder and decoder.

So, is this the best approach for this problem?

Does PPO works well with RNNs?

Also, what might be the reason that the network is not learning even when I am using normalized environment?

Any help would be highly appreciated.

submitted by /u/xicor7017
[link] [comments]