Generalization In Reinforcement Learning

tags: Reinforcement Learning ⭐

Generalization using successor features (Dayan 1993).

Adapt to new reward structure (Barreto 2018)

How many tasks are needed before modern approaches generalize? (Cobbe 2019)

Generalization with selective noise injection and information bottleneck

Insights:

Selective noise injection for gradient update but not behaviour (rollout) policy speeds learning
Regularization with Information bottleneck is particularly effective

$\nabla_{θ} J (π_{θ}) = {\hat{E}}_{π_{θ}^{r} (a_{t} | x_{t})} [\sum_{t}^{T} \frac{π_{θ} (a_{t} | x_{t})}{π_{θ}^{r} (a_{t} | x_{t})} \nabla_{θ} \log π_{θ} (a_{t} | x_{t}) {\hat{A}}_{t}]$

Benchmarks

Multi-Room (Chevalier 2018)
- No room is seen twice
CoinRun (Cobbe 2019)
openai/procgen