Information-Theoretic Reinforcement Learning

Can we learn without any reward function at all?

Identities

entropy: \(\mathcal{H}(p(x)) = - E_{x \sim p(x)}[\log p(x)]\)
mutual information: \(\mathcal{I}(x;y) = D_{KL}(p(x,y) || p(x)p(y))\)

Information theoretic quantities in RL

\(\pi(s)\): state marginal distribution of policy \(\pi\)
\(\mathcal{H}(\pi(s))\): state marginal entropy of policy \(\pi\)
empowerment: \(\mathcal{I}(s_{t+1};a_t) = \mathcal{H}(s_{t+1}) - \mathcal{H}(s_{t+1}|a_t)\)

Papers

Skew-Fit (Pong et al., n.d.)
Diversity is All your Need (Eysenbach et al., n.d.)

Bibliography

Eysenbach, Benjamin, Abhishek Gupta, Julian Ibarz, and Sergey Levine. n.d. “Diversity Is All You Need: Learning Skills without a Reward Function.” http://arxiv.org/abs/1802.06070v6.

Pong, Vitchyr H., Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. n.d. “Skew-Fit: State-Covering Self-Supervised Reinforcement Learning.” http://arxiv.org/abs/1903.03698v2.