# Playing Atari with Deep RL

## Playing Atari With Deep RL (NO_ITEM_DATA:mnih2013playing)

### Preprocessing Steps

- Obtain raw pixels of size \(210 \times 160\)
- Grayscale and downsample to \(110 \times 84\)
- Crop representative \(84 \times 84\) region
- Stack the last 4 frames in history to form the \(84 \times 84 \times 4\) input

### DQN

- Use of Experience Replay buffer
- Separate target network stabilizes optimization targets:

\begin{equation} \delta = r_t + \gamma \mathrm{max}_a Q(s_{t+1}, a ; \theta’) - Q(s_t, a_t; \theta) \end{equation}

The network parameterized with \(\theta ‘\) is a snapshot of the network at some point in time, so the optimization target doesn’t change so rapidly.

- Clip \(\delta\) to \(\left[1, -1\right]\)

## Improving DQN

- Double Q-learning reduces bias (Van Hasselt, Guez, and Silver, n.d.)
- Average Q-learning reduces variance (Anschel, Baram, and Shimkin, n.d.)
- Hindsight Experience Replay (Andrychowicz et al., n.d.)
- Distributional RL (Dabney et al., n.d.)

## References

## Bibliography

Andrychowicz, Marcin, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. n.d. “Hindsight Experience Replay.” In *Advances in Neural Information Processing Systems*, 5048–58.

Anschel, Oron, Nir Baram, and Nahum Shimkin. n.d. “Averaged-Dqn: Variance Reduction and Stabilization for Deep Reinforcement Learning.” In *Proceedings of the 34th International Conference on Machine Learning-Volume 70*, 176–85. JMLR. org.

Dabney, Will, Mark Rowland, Marc G Bellemare, and Rémi Munos. n.d. “Distributional Reinforcement Learning with Quantile Regression.” In *Thirty-Second AAAI Conference on Artificial Intelligence*.

Van Hasselt, Hado, Arthur Guez, and David Silver. n.d. “Deep Reinforcement Learning with Double Q-Learning.” In *Thirtieth AAAI Conference on Artificial Intelligence*.

NO_ITEM_DATA:mnih2013playing