Jethro's Braindump

Information-Theoretic Reinforcement Learning

Can we learn without any reward function at all?

Identities

entropy
H(p(x))=Exp(x)[logp(x)]
mutual information
I(x;y)=DKL(p(x,y)||p(x)p(y))

Information theoretic quantities in RL

π(s)
state marginal distribution of policy π
H(π(s))
state marginal entropy of policy π
empowerment
I(st+1;at)=H(st+1)H(st+1|at)

Papers

Bibliography

Eysenbach, Benjamin, Abhishek Gupta, Julian Ibarz, and Sergey Levine. n.d. “Diversity Is All You Need: Learning Skills without a Reward Function.” http://arxiv.org/abs/1802.06070v6.

Pong, Vitchyr H., Murtaza Dalal, Steven Lin, Ashvin Nair, Shikhar Bahl, and Sergey Levine. n.d. “Skew-Fit: State-Covering Self-Supervised Reinforcement Learning.” http://arxiv.org/abs/1903.03698v2.