[D] Giving OpenAI Five the capacity to adapt to the opponent
Because it takes 45,000 years of gameplay experience to train OAI5, changing the behavior of the AI for each player opponent is currently not possible. The thing is that the AI try to learn the best way to play and to win overall rather than trying to win against a particular opponent, this approach already give amazing results but I would like to consider what would it takes to give the agents the capability to change their behaviors according to the opponent player.
I have one suggestion for this and would love to get your feedback about it. I was thinking about trying to come up during the game with a latent space that aims to encode the opponent behavior. Thus, each agent would take actions not only from the observation of the world but also from the latent space encoding the behavior of the opponent. We can think of this additional observation as the latent space of one auto-encoder that gives an encoded vector representation of the inputs.
Train such representation could be done by a self-supervised manner. During the training, each agent would have an additional LSTM that try to predict what is the next actions the opponent in the observation space is up to do. Once this LSTM starts to be trained properly, the inner representation of the LSTM would encode in some way the behavior of the opponent’s agents observed by each agent. Because each agent perceived different observations during the game, the 5 LSTM inner states can be combined and used as an additional input for each agent. This combined representation would encode the behavior of the opponent team overall. This representation would be similar to the way a Human team can communicate and adapt about the other team during the game.
Thus, the observation is no longer the observation each agent is perceiving around. But each agent is now taking action with respect to the opponent behavior. That would give OAI5 the capacity to adapt to the opponent during the game. But also to reuse the same representation if the same game is run against the same opponent.
Do you think this proposition make sense? Also, I would be interested to know your propositions and what are for you the best papers that might bring a solution to this problem for the future of the field?