Jethro's Braindump

Information-Theoretic Reinforcement Learning

Can we learn without any reward function at all?

Identities

entropy
\(\mathcal{H}(p(x)) = - E_{x \sim p(x)}[\log p(x)]\)
mutual information
\(\mathcal{I}(x;y) = D_{KL}(p(x,y) || p(x)p(y))\)

Information theoretic quantities in RL

\(\pi(s)\)
state marginal distribution of policy \(\pi\)
\(\mathcal{H}(\pi(s))\)
state marginal entropy of policy \(\pi\)
empowerment
\(\mathcal{I}(s_{t+1};a_t) = \mathcal{H}(s_{t+1}) - \mathcal{H}(s_{t+1}|a_t)\)

Papers

Bibliography

Eysenbach, Benjamin, Abhishek Gupta, Julian Ibarz, and Sergey Levine. n.d. “Diversity Is All You Need: Learning Skills Without a Reward Function.” http://arxiv.org/abs/1802.06070v6.

Pong, Vitchyr H., Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. n.d. “Skew-Fit: State-Covering Self-Supervised Reinforcement Learning.” http://arxiv.org/abs/1903.03698v2.

Links to this note