[D] Learning a transition function for skill representation from game outcomes among a group of agents playing 2-player zero sum games
This has been an idea I’ve been bouncing around in my head for the past year or so but I’ve been struggling to come up with a way to apply ML to it.
The idea is this. Imagine you had a group of agents who randomly match up and repeatedly play 2-player zero-sum games against one other. This game has a known structure, and each agent has a strategy for playing the game that has varied success depending on the opponent they face.
The goal is to represent each agent as a point in feature space such that these features maximize the ability to infer “skill level” for a player. Then given 2 agents’ representations, the outcome of a matchup between those agents can be predicted.
However, the idea is that the features have to be inferred from interactions between other players, so after each matchup’s outcome the features of the two agents are adjusted based on new information, and agents’ ratings converge to their “true” rating as more games are played.
The idea is inspired by the Elo rating system used widely in chess and competitive gaming, because it’s a special case of what I’m trying to do. Each player has an Elo rating that is a normally distributed measure of skill. After each seen game outcome, the winner gains points and the loser loses points, but the amount transferred depends on the disparity in score between players. The difference in Elo rating produces a predicted chance to win/lose for either player when a game is played between them.
In a more generalized case where you have an N-dimensional rating with more possible game outcomes, could you learn the optimal transition function that would need to be applied after each game that can maximize game outcome prediction accuracy?