Gu, Shixiang Dept. Empirical Inference, Max Planck Institute for Intelligent Systems, Max Planck Society;
Link (Any fulltext)
Gu, S., Lillicrap, T., Ghahramani, Z., Turner, R. E., & Levine, S. (2017). Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic. In ICLR 2017 - Conference Track. Amherst, MA: OpenReview.net. Retrieved from https://openreview.net/forum?id=SJ3rcZcxl.