Gu, S., Lillicrap, T., Ghahramani, Z., Turner, R. E., & Levine, S. (2017). Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic. In Proceedings International Conference on Learning Representations 2017. Amherst, MA: OpenReviews.net. Retrieved from https://openreview.net/pdf?id=SJ3rcZcxl.