Jethro's Braindump

Playing Atari with Deep RL

Playing Atari With Deep RL (NO_ITEM_DATA:mnih2013playing)

Preprocessing Steps

  1. Obtain raw pixels of size \(210 \times 160\)
  2. Grayscale and downsample to \(110 \times 84\)
  3. Crop representative \(84 \times 84\) region
  4. Stack the last 4 frames in history to form the \(84 \times 84 \times 4\) input

DQN

  1. Use of Experience Replay buffer
  2. Separate target network stabilizes optimization targets:

\begin{equation} \delta = r_t + \gamma \mathrm{max}_a Q(s_{t+1}, a ; \theta’) - Q(s_t, a_t; \theta) \end{equation}

The network parameterized with \(\theta ‘\) is a snapshot of the network at some point in time, so the optimization target doesn’t change so rapidly.

  1. Clip \(\delta\) to \(\left[1, -1\right]\)

Improving DQN

References

Bibliography

Andrychowicz, Marcin, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. n.d. “Hindsight Experience Replay.” In Advances in Neural Information Processing Systems, 5048–58.

Anschel, Oron, Nir Baram, and Nahum Shimkin. n.d. “Averaged-Dqn: Variance Reduction and Stabilization for Deep Reinforcement Learning.” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 176–85. JMLR. org.

Dabney, Will, Mark Rowland, Marc G Bellemare, and Rémi Munos. n.d. “Distributional Reinforcement Learning with Quantile Regression.” In Thirty-Second AAAI Conference on Artificial Intelligence.

Van Hasselt, Hado, Arthur Guez, and David Silver. n.d. “Deep Reinforcement Learning with Double Q-Learning.” In Thirtieth AAAI Conference on Artificial Intelligence.

NO_ITEM_DATA:mnih2013playing