Jethro's Braindump

Playing Atari with Deep RL

Playing Atari With Deep RL (NO_ITEM_DATA:mnih2013playing)

Preprocessing Steps

  1. Obtain raw pixels of size 210×160
  2. Grayscale and downsample to 110×84
  3. Crop representative 84×84 region
  4. Stack the last 4 frames in history to form the 84×84×4 input

DQN

  1. Use of Experience Replay buffer
  2. Separate target network stabilizes optimization targets:

δ=rt+γmaxaQ(st+1,a;θ)Q(st,at;θ)

The network parameterized with θ is a snapshot of the network at some point in time, so the optimization target doesn’t change so rapidly.

  1. Clip δ to [1,1]

Improving DQN

References

Bibliography

Andrychowicz, Marcin, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, OpenAI Pieter Abbeel, and Wojciech Zaremba. n.d. “Hindsight Experience Replay.” In Advances in Neural Information Processing Systems, 5048–58.

Anschel, Oron, Nir Baram, and Nahum Shimkin. n.d. “Averaged-Dqn: Variance Reduction and Stabilization for Deep Reinforcement Learning.” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 176–85. JMLR. org.

Dabney, Will, Mark Rowland, Marc G Bellemare, and Rémi Munos. n.d. “Distributional Reinforcement Learning with Quantile Regression.” In Thirty-Second AAAI Conference on Artificial Intelligence.

Van Hasselt, Hado, Arthur Guez, and David Silver. n.d. “Deep Reinforcement Learning with Double Q-Learning.” In Thirtieth AAAI Conference on Artificial Intelligence.

NO_ITEM_DATA:mnih2013playing