Information-Theoretic Reinforcement Learning
Can we learn without any reward function at all?
Identities
- entropy
- \(\mathcal{H}(p(x)) = - E_{x \sim p(x)}[\log p(x)]\)
- mutual information
- \(\mathcal{I}(x;y) = D_{KL}(p(x,y) || p(x)p(y))\)
Information theoretic quantities in RL
- \(\pi(s)\)
- state marginal distribution of policy \(\pi\)
- \(\mathcal{H}(\pi(s))\)
- state marginal entropy of policy \(\pi\)
- empowerment
- \(\mathcal{I}(s_{t+1};a_t) = \mathcal{H}(s_{t+1}) - \mathcal{H}(s_{t+1}|a_t)\)
Papers
- Skew-Fit (Pong et al., n.d.)
- Diversity is All your Need (Eysenbach et al., n.d.)
Bibliography
Eysenbach, Benjamin, Abhishek Gupta, Julian Ibarz, and Sergey Levine. n.d. “Diversity Is All You Need: Learning Skills without a Reward Function.” http://arxiv.org/abs/1802.06070v6.
Pong, Vitchyr H., Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. n.d. “Skew-Fit: State-Covering Self-Supervised Reinforcement Learning.” http://arxiv.org/abs/1903.03698v2.