[D] How does AlphaStar, a NN that players StarCraft, encode its output?
For something like AlphaGo (that plays a simple board game), I understand that the neural network can output a “grid” vector the size of the board, and the largest value in the output, which is also a valid move, is the move you make*. In this case, the neural network is solving the same simple question repeatedly, “Where do I move?”. I know how to encode the answer to that question. There’s around 400 possible moves in Go, and they are fixed, so a vector of length 400 can encode every possible action.
(* Actually, AlphaGo uses the NN in a tree search. The NN does not generate moves directly.)
I don’t understand how a neural network like AlphaStar can output an answer to the much broader question “What should I do?”. The answers can be “build a building”, “kill one of your own buildings”, “build a unit”, “attack a unit”, “move 2 of your units to position”, “move 3 of your units to another positions”, “load your units into a transport”, “use one of your units special abilities”, “research a new technology”, etc.
How are the answers to such a broad question encoded? Do we know how AlphaStar does it?
I’m especially baffled by the change number of units in StarCraft. Encoding the actions 2 units can take seems significantly different than encoding the actions 3 units can take. Do they use a multi-agent setup? Is each unit running its own NN and determining its own actions individually?