[D] Has anyone used continuous RL algorithms to output the parameters of a probability distribution that actions are then sampled from
So I’m working on a problem where I need an agent to perform multiple different actions at each time step. A solution I have in mind is to have the agent output the mean and covariance of a gaussian distribution and then sample the actions from the Gaussian distribution. Has anyone seen anything like this? Does this seem like an immediately bad idea?