Jethro's Braindump

Playing Atari with Deep RL

Playing Atari With Deep RL (Mnih et al., 2013)

Preprocessing Steps

  1. Obtain raw pixels of size \(210 \times 160\)
  2. Grayscale and downsample to \(110 \times 84\)
  3. Crop representative \(84 \times 84\) region
  4. Stack the last 4 frames in history to form the \(84 \times 84 \times 4\) input

DQN

  1. Use of §experience_replay buffer
  2. Separate target network stabilizes optimization targets:

\begin{equation} \delta = r_t + \gamma \mathrm{max}_a Q(s_{t+1}, a ; \theta’) - Q(s_t, a_t; \theta) \end{equation}

The network parameterized with \(\theta ‘\) is a snapshot of the network at some point in time, so the optimization target doesn’t change so rapidly.

  1. Clip \(\delta\) to \(\left[1, -1\right]\)

Improving DQN

References

Bibliography

Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M., Playing atari with deep reinforcement learning, arXiv preprint arXiv:1312.5602, (), (2013).

Van Hasselt, H., Guez, A., & Silver, D., Deep reinforcement learning with double q-learning, In , Thirtieth AAAI conference on artificial intelligence (pp. ) (2016). : .

Anschel, O., Baram, N., & Shimkin, N., Averaged-dqn: variance reduction and stabilization for deep reinforcement learning, In , Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 176–185) (2017). : .

Andrychowicz, M., Wolski, F., Ray, A., Schneider, J., Fong, R., Welinder, P., McGrew, B., …, Hindsight experience replay, In , Advances in Neural Information Processing Systems (pp. 5048–5058) (2017). : .

Dabney, W., Rowland, M., Bellemare, M. G., & Munos, R\‘emi, Distributional reinforcement learning with quantile regression, In , Thirty-Second AAAI Conference on Artificial Intelligence (pp. ) (2018). : .

Icon by Laymik from The Noun Project. Website built with ♥ with Org-mode, Hugo, and Netlify.