Skip to main content

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

LEARN, CONNECT, SHARE

Join our meetup, learn, connect, share, and get to know your Toronto AI community. 

JOB POSTINGS

INDEED POSTINGS

Browse through the latest deep learning, ai, machine learning postings from Indeed for the GTA.

CONTACT

CONNECT WITH US

Are you looking to sponsor space, be a speaker, or volunteer, feel free to give us a shout.

[R] OgmaNeo2 Reinforcement Learning

Link to the blog post: https://ogma.ai/2019/06/ogmaneo2-and-reinforcement-learning/

Hey all,

We have finally figured out a good way of integrating reinforcement learning into our biologically plausible, fully online/incremental learning system OgmaNeo2 (implementing Sparse Predictive Hierarchies, SPH). Here we provide a few demos and some high level description, as well as links to learn more.

Included among the demos is a real-world mini-sumo robot fight, where the agents are implemented using our system. The game proceeds in episodes and automatically resets itself.

For those who are wondering what SPH is, we included a link to a more in-depth presentation in the blog post. Here is a quick summary though:

SPH is a fully online/incremental lifelong learning system that does not use backpropagation, and is biologically plausible. It is also extremely fast, able to run in real-time on platforms such as a Raspberry Pi Zero with learning enabled. It uses a bidirectional hierarchy of very sparse encoder-decoder pairs. It is activated in two passes (although asynchronous implementation is also possible): An up-pass followed by a down-pass. All input/output occurs at the “bottom” of the hierarchy. Each encoder-decoder pair forms a layer, and each layer clocks at a slower rate than the layer directly below. We call this “exponential memory”, as it encodes information into slower and slower timescales going up the hierarchy, and decodes into faster timescales going down, allowing us to bridge exponentially large time lags with respect to the number of layers.

For reinforcement learning, we took out the original decoders which just predicted the next timestep(s) of input and replaced them with a swarm of reinforcement learning agents that all seek to locally maximize the same reward.

Let us know what you think!

submitted by /u/CireNeikual
[link] [comments]