[D] Can someone explain to me how in the reinforcement learning algorthim, A3C, how the multiple workers enusre they won’t retrieve the same parameters from the global network they just updated?
I understand that the multiple workers do gradient update to the global network is done asynchronously in A3C ( https://arxiv.org/abs/1602.01783 ).
But how do the workers ensure that they won’t retrieve the same parameters from the global network they just updated?
Thank you.
submitted by /u/ml4564
[link] [comments]