[D] How does AlphaStar, a NN that players StarCraft, encode its output?

Written by torontoai on October 9, 2019. Posted in Reddit MachineLearning.

For something like AlphaGo (that plays a simple board game), I understand that the neural network can output a “grid” vector the size of the board, and the largest value in the output, which is also a valid move, is the move you make*. In this case, the neural network is solving the same simple question repeatedly, “Where do I move?”. I know how to encode the answer to that question. There’s around 400 possible moves in Go, and they are fixed, so a vector of length 400 can encode every possible action.

(* Actually, AlphaGo uses the NN in a tree search. The NN does not generate moves directly.)

I don’t understand how a neural network like AlphaStar can output an answer to the much broader question “What should I do?”. The answers can be “build a building”, “kill one of your own buildings”, “build a unit”, “attack a unit”, “move 2 of your units to position”, “move 3 of your units to another positions”, “load your units into a transport”, “use one of your units special abilities”, “research a new technology”, etc.

How are the answers to such a broad question encoded? Do we know how AlphaStar does it?

I’m especially baffled by the change number of units in StarCraft. Encoding the actions 2 units can take seems significantly different than encoding the actions 3 units can take. Do they use a multi-agent setup? Is each unit running its own NN and determining its own actions individually?

submitted by /u/Buttons840
[link] [comments]

Blog

Learn About Our Meetup

5000+ Members

MEETUPS

JOB POSTINGS

CONTACT

[D] How does AlphaStar, a NN that players StarCraft, encode its output?