https://papers.nips.cc/paper/1143-improving-policies-without-measuring-merits.pdf (出版社版)
Dayan, P., & Singh, S. (1996). Improving Policies without Measuring Merits. In D., Touretzky, M., Mozer, & M., Hasselmo (Eds.), Advances in Neural Information Processing Systems 8 (pp. 1059-1065). Cambridge, MA, USA: MIT Press.