Information-Theoretic Reinforcement Learning

Can we learn without any reward function at all?

Identities

entropy: $H (p (x)) = - E_{x \sim p (x)} [\log p (x)]$
mutual information: $I (x; y) = D_{K L} (p (x, y) | | p (x) p (y))$

Information theoretic quantities in RL

$π (s)$: state marginal distribution of policy $π$
$H (π (s))$: state marginal entropy of policy $π$
empowerment: $I (s_{t + 1}; a_{t}) = H (s_{t + 1}) - H (s_{t + 1} | a_{t})$

Papers

Skew-Fit (Pong et al., n.d.)
Diversity is All your Need (Eysenbach et al., n.d.)

Bibliography

Eysenbach, Benjamin, Abhishek Gupta, Julian Ibarz, and Sergey Levine. n.d. “Diversity Is All You Need: Learning Skills without a Reward Function.” http://arxiv.org/abs/1802.06070v6.

Pong, Vitchyr H., Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. n.d. “Skew-Fit: State-Covering Self-Supervised Reinforcement Learning.” http://arxiv.org/abs/1903.03698v2.