[D] (on-policy) exploration when adding new actions
I am using policy gradient DRL with on-policy exploration in a discrete domain.
After some-time, with significant exploration, with a decent network performance, I have to handle newly discovered actions. I can “widen” and initialize the network to handle these actions.
is there recommendation for increasing the exploration rate, and specifically “over-exploring” these new actions?
The data domain itself is structured/tabular/wide.