Playing Atari with Deep RL
Playing Atari With Deep RL (NO_ITEM_DATA:mnih2013playing)
Preprocessing Steps
- Obtain raw pixels of size
- Grayscale and downsample to
- Crop representative
region - Stack the last 4 frames in history to form the
input
DQN
- Use of Experience Replay buffer
- Separate target network stabilizes optimization targets:
The network parameterized with
- Clip
to
Improving DQN
- Double Q-learning reduces bias (Van Hasselt, Guez, and Silver, n.d.)
- Average Q-learning reduces variance (Anschel, Baram, and Shimkin, n.d.)
- Hindsight Experience Replay (Andrychowicz et al., n.d.)
- Distributional RL (Dabney et al., n.d.)
References
Bibliography
Andrychowicz, Marcin, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. n.d. “Hindsight Experience Replay.” In Advances in Neural Information Processing Systems, 5048–58.
Anschel, Oron, Nir Baram, and Nahum Shimkin. n.d. “Averaged-Dqn: Variance Reduction and Stabilization for Deep Reinforcement Learning.” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 176–85. JMLR. org.
Dabney, Will, Mark Rowland, Marc G Bellemare, and Rémi Munos. n.d. “Distributional Reinforcement Learning with Quantile Regression.” In Thirty-Second AAAI Conference on Artificial Intelligence.
Van Hasselt, Hado, Arthur Guez, and David Silver. n.d. “Deep Reinforcement Learning with Double Q-Learning.” In Thirtieth AAAI Conference on Artificial Intelligence.
NO_ITEM_DATA:mnih2013playing