Playing Atari with Deep RL

Playing Atari With Deep RL (NO_ITEM_DATA:mnih2013playing)

Preprocessing Steps

Obtain raw pixels of size $210 \times 160$
Grayscale and downsample to $110 \times 84$
Crop representative $84 \times 84$ region
Stack the last 4 frames in history to form the $84 \times 84 \times 4$ input

DQN

Use of Experience Replay buffer
Separate target network stabilizes optimization targets:

$δ = r_{t} + γ \max_{a} Q (s_{t + 1}, a; θ^{'}) - Q (s_{t}, a_{t}; θ)$

The network parameterized with $θ ‘$ is a snapshot of the network at some point in time, so the optimization target doesn’t change so rapidly.

Clip $δ$ to $[1, - 1]$

Improving DQN

Double Q-learning reduces bias (Van Hasselt, Guez, and Silver, n.d.)
Average Q-learning reduces variance (Anschel, Baram, and Shimkin, n.d.)
Hindsight Experience Replay (Andrychowicz et al., n.d.)
Distributional RL (Dabney et al., n.d.)

References

Defeating the Deadly Triad: | random walks and lots of ♥s

Bibliography

Andrychowicz, Marcin, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. n.d. “Hindsight Experience Replay.” In Advances in Neural Information Processing Systems, 5048–58.

Anschel, Oron, Nir Baram, and Nahum Shimkin. n.d. “Averaged-Dqn: Variance Reduction and Stabilization for Deep Reinforcement Learning.” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 176–85. JMLR. org.

Dabney, Will, Mark Rowland, Marc G Bellemare, and Rémi Munos. n.d. “Distributional Reinforcement Learning with Quantile Regression.” In Thirty-Second AAAI Conference on Artificial Intelligence.

Van Hasselt, Hado, Arthur Guez, and David Silver. n.d. “Deep Reinforcement Learning with Double Q-Learning.” In Thirtieth AAAI Conference on Artificial Intelligence.

NO_ITEM_DATA:mnih2013playing

Jethro's Braindump

Playing Atari with Deep RL