Jethro's Braindump

Information Theoretic Reinforcement Learning

Can we learn without any reward function at all?

Identities

entropy
\(\mathcal{H}(p(x)) = - E_{x \sim p(x)}[\log p(x)]\)
mutual information
\(\mathcal{I}(x;y) = D_{KL}(p(x,y) || p(x)p(y))\)

Information theoretic quantities in RL

\(\pi(s)\)
state marginal distribution of policy \(\pi\)
\(\mathcal{H}(\pi(s))\)
state marginal entropy of policy \(\pi\)
empowerment
\(\mathcal{I}(s_{t+1};a_t) = \mathcal{H}(s_{t+1}) - \mathcal{H}(s_{t+1}|a_t)\)

Papers

Bibliography

Pong, V. H., Dalal, M., Lin, S., Nair, A., Bahl, S., & Levine, S., Skew-fit: state-covering self-supervised reinforcement learning, CoRR, (), (2019).

Eysenbach, B., Gupta, A., Ibarz, J., & Levine, S., Diversity is all you need: learning skills without a reward function, CoRR, (), (2018).

Icon by Laymik from The Noun Project. Website built with ♥ with Org-mode, Hugo, and Netlify.