[D] Prior knowledge on Actor-critic / policy gradient methods for portfolio allocation
So, I have to solve a portfolio allocation problem, which can be formulated as:
given an input (financial indicators), output a vector of weights for assets (that sum up to 1) in order to maximize a “performance function”.
Translating this formulation to an RL problem seems pretty straight forward. However, I don’t have much data (a couple of hundred data points). So, I was wondering if it is possible to incorporate prior knowledge in order to have a better training with fewer data.
Can I incorporate knowledge by using a “custom” advantage function in Actor-critic? What about using Bayesian policy gradient / Actor-critic?
Does that make sense?