[R] Badger architecture by GoodAI
Badger = an architecture and a learning procedure where:
An agent is made up of many experts
All experts share the same communication policy (expert policy), but have different internal memory states
There are two levels of learning, an inner loop (with a communication stage) and an outer loop
Inner loop – Agent’s behavior and adaptation emerges as a result of experts communicating between each other. Experts send messages (of any complexity) to each other and update their internal memories/states based on observations/messages and their internal state from the previous time-step. Expert policy is fixed and does not change during the inner loop.
Inner loop loss need not even be a proper loss function. It can be any kind of structured feedback guiding the adaptation during the agent’s lifetime.
Outer loop – An expert policy is discovered over generations of agents, ensuring that strategies that find solutions to problems in diverse environments can quickly emerge in the inner loop.
Agent’s objective is to adapt fast to novel tasks
Exhibiting the following novel properties:
Roles of experts and connectivity among them assigned dynamically at inference time
Learned communication protocol with context-dependent messages of varied complexity
Generalizes to different numbers and types of inputs/outputs
Can be trained to handle variations in architecture during both training and testing