[D] Policy Distillation in a continuous action space with no knowledge of teacher distribution
Has anyone seen any work related to performing Policy Distillation in a continuous action space with no knowledge of the teacher distribution(black box policy returning only the action)? My guess is to perform something along the lines of negative log-likelihood(NLL). submitted by /u/CartPole |