[D] Reinforcement learning with combined continuous and discrete action space?
Hi, I’m working on a reinforcement learning project to teach an AI to play a video game. Specifically I’m implementing A2C. The RL literature has many examples of either continuous or discrete action spaces but many video games have both types of inputs e.g. mouse position and keyboard input.
In my specific scenario I have continuous mouse input: X and Y coordinate. A set of 5 weapons the agent chooses 1 from. And 5 buttons the agent can press (for jumping, shooting, etc.) which can be pressed simultaneously.
Can I have multiple output heads for the different types of actions and take the mean of the losses? Will it converge?
Specifically there would be two heads with two nodes each for mouse position mean and variance. Additionally one head with 5 nodes for the weapon selection with softmax activation. And another head with 5 nodes for the “button actions” with sigmoid activation.
Of course you need different loss and entropy functions for the continuous mouse position and the discrete weapon selection. I don’t know how to calculate loss and entropy of my “button actions”, but that’s another question: https://www.reddit.com/r/MachineLearning/comments/9z8tok/d_reinforcement_learning_with_multiple/
So if I have the loss and entropy for each of mouse position, weapon selection and “button actions”. Should I average all the losses and all the entropies to use them in my final loss function? Should they be weighted in some way?