[D] Requirements for a fast model-building algorithm in one-shot model-based reinforcement learning
Comparision of algorithms for the fast extraction of a model from real world observations to be used for predicting rewards at different future timespans.
Requirements: * Time – Has memory of at least 20 steps so that it can handle temporal sequences * 1sht – Can learn from a single example so that it doesn’t need hundreds of training samples for each class * Hier – Is hierarchical so that it generalizes well (not just flat memorization) * Arch – Can learn the architecture from data so that it doesn’t need to be predefined by the developers * Curr – Has curriculum learning so that it can be trained successively and doesn’t suffer from catastrophic forgetting * Scal – Can be scaled up to at least 1 million inputs so that it’s not limited to toy environments
Algo | Time | 1sht | Hier | Arch | Curr | Scal |
---|---|---|---|---|---|---|
NNGP | 🚫 | ✓ | ✓ | 🚫 | ✓ | ✓ |
GHSOM | 🚫 | 🚫 | ✓ | ✓ | ✓ | ✓ |
THSOM | ✓ | 🚫 | 🚫 | 🚫 | ✓ | ✓ |
BPTT | ✓ | 🚫 | ✓ | 🚫 | 🚫 | ✓ |
GA | ✓ | 🚫 | ✓ | ✓ | 🚫 | 🚫 |
HTM | ✓ | 🚫 | ✓ | 🚫 | ✓ | ✓ |
Candidate algorithms: * NNGP – Nearest Neighbor Gaussian Processes https://amstat.tandfonline.com/doi/abs/10.1080/01621459.2015.1044091 * GHSOM – Growing Hierarchical Self-Organizing Map http://www.ifs.tuwien.ac.at/~andi/ghsom/ * THSOM – Temporal Hebbian Self-organizing Map https://link.springer.com/chapter/10.1007/978-3-540-87536-9_65 * BPTT – Recurrent Neural Networks trained with Backpropagation Through Time, for example https://en.wikipedia.org/wiki/Long_short-term_memory * GA – Genetic Algorithms https://en.wikipedia.org/wiki/Genetic_algorithm * HTM – Hierarchical Temporal Memory https://en.wikipedia.org/wiki/Hierarchical_temporal_memory or in German https://de.wikipedia.org/wiki/Hierarchischer_Temporalspeicher
The table probably has errors because I’m not an expert and just wanna watch progress in AGI. But the current backprop winter is boring me, and if no one else is taking the initiative then an outsider from the audience has to.
As I don’t understand the math in the paper for NNGPs, I’m assuming that they are just a hierarchical version of the simple nearest neighbor algorithm. Or that the two SOM-descendants are just standard self-organizing maps plus some fancy extensions for hierarchical architecture and time.
Drop a note if you find an error and I will fix the table.
submitted by /u/wlorenz65
[link] [comments]