Can we learn *without* any reward function at all?

## Identities

- entropy
- \(\mathcal{H}(p(x)) = - E_{x \sim p(x)}[\log p(x)]\)
- mutual information
- \(\mathcal{I}(x;y) = D_{KL}(p(x,y) || p(x)p(y))\)

## Information theoretic quantities in RL

- \(\pi(s)\)
- state marginal distribution of policy \(\pi\)
- \(\mathcal{H}(\pi(s))\)
- state marginal entropy of policy \(\pi\)
- empowerment
- \(\mathcal{I}(s_{t+1};a_t) = \mathcal{H}(s_{t+1}) - \mathcal{H}(s_{t+1}|a_t)\)

## Papers

- Skew-Fit (Pong et al., 2019)
- Diversity is All your Need (Eysenbach et al., 2018)

# Bibliography

Pong, V. H., Dalal, M., Lin, S., Nair, A., Bahl, S., & Levine, S., *Skew-fit: state-covering self-supervised reinforcement learning*, CoRR, *()*, (2019). ↩

Eysenbach, B., Gupta, A., Ibarz, J., & Levine, S., *Diversity is all you need: learning skills without a reward function*, CoRR, *()*, (2018). ↩