[D] optimizing clipping functions
In Reinforcement Learning I have noticed a trend in some(1, 2) papers that involve optimizing surrogate clipped functions. Has anyone seen any work that digs deeper into the effects of this? For example, in this paper they dig deeper into the relationship between clipped surrogate functions and trust regions. The above references I gave were from clipped surrogate objectives, but this doesn’t have to be the case(ex: drop the max and only optimize the clipped objective).