# Playing Atari with Deep RL

### Backlinks

## Playing Atari With Deep RL (Mnih et al. 2013)

### Preprocessing Steps

- Obtain raw pixels of size \(210 \times 160\)
- Grayscale and downsample to \(110 \times 84\)
- Crop representative \(84 \times 84\) region
- Stack the last 4 frames in history to form the \(84 \times 84 \times 4\) input

### DQN

- Use of Experience Replay buffer
- Separate target network stabilizes optimization targets:

\begin{equation} \delta = r_t + \gamma \mathrm{max}_a Q(s_{t+1}, a ; \theta’) - Q(s_t, a_t; \theta) \end{equation}

The network parameterized with \(\theta ‘\) is a snapshot of the network at some point in time, so the optimization target doesn’t change so rapidly.

- Clip \(\delta\) to \(\left[1, -1\right]\)

## Improving DQN

- Double Q-learning reduces bias (Van Hasselt, Guez, and Silver 2016)
- Average Q-learning reduces variance (Anschel, Baram, and Shimkin 2017)
- Hindsight Experience Replay (Andrychowicz et al. 2017)
- Distributional RL (Dabney et al. 2018)

## References

## Bibliography

Andrychowicz, Marcin, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. 2017. “Hindsight Experience Replay.” In *Advances in Neural Information Processing Systems*, 5048–58.

Anschel, Oron, Nir Baram, and Nahum Shimkin. 2017. “Averaged-Dqn: Variance Reduction and Stabilization for Deep Reinforcement Learning.” In *Proceedings of the 34th International Conference on Machine Learning-Volume 70*, 176–85. JMLR. org.

Dabney, Will, Mark Rowland, Marc G Bellemare, and Rémi Munos. 2018. “Distributional Reinforcement Learning with Quantile Regression.” In *Thirty-Second AAAI Conference on Artificial Intelligence*.

Mnih, Volodymyr, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. “Playing Atari with Deep Reinforcement Learning.” *arXiv Preprint arXiv:1312.5602*.

Van Hasselt, Hado, Arthur Guez, and David Silver. 2016. “Deep Reinforcement Learning with Double Q-Learning.” In *Thirtieth AAAI Conference on Artificial Intelligence*.