Is the Policy Gradient a Gradient?
AAMAS, pp. 939-947, 2020.
The policy gradient theorem describes the gradient of the expected discounted return with respect to an agent's policy parameters. However, most policy gradient methods do not use the discount factor in the manner originally prescribed, and therefore do not optimize the discounted objective. It has been an open question in RL as to whic...More
PPT (Upload PPT)