# Information-Theoretic Reinforcement Learning

Can we learn *without* any reward function at all?

## Identities

- entropy
- \(\mathcal{H}(p(x)) = - E_{x \sim p(x)}[\log p(x)]\)
- mutual information
- \(\mathcal{I}(x;y) = D_{KL}(p(x,y) || p(x)p(y))\)

## Information theoretic quantities in RL

- \(\pi(s)\)
- state marginal distribution of policy \(\pi\)
- \(\mathcal{H}(\pi(s))\)
- state marginal entropy of policy \(\pi\)
- empowerment
- \(\mathcal{I}(s_{t+1};a_t) = \mathcal{H}(s_{t+1}) - \mathcal{H}(s_{t+1}|a_t)\)

## Papers

- Skew-Fit (Pong et al., n.d.)
- Diversity is All your Need (Eysenbach et al., n.d.)

## Bibliography

Eysenbach, Benjamin, Abhishek Gupta, Julian Ibarz, and Sergey Levine. n.d. “Diversity Is All You Need: Learning Skills Without a Reward Function.” http://arxiv.org/abs/1802.06070v6.

Pong, Vitchyr H., Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. n.d. “Skew-Fit: State-Covering Self-Supervised Reinforcement Learning.” http://arxiv.org/abs/1903.03698v2.