Transfer Learning

tags: Reinforcement Learning ⭐

Prior understanding of problem structure can help us solve complex tasks quickly. Perhaps solving prior tasks would help acquire useful knowledge for solving a new task.

Transfer learning is the use of experience from one set of tasks for faster learning and better performance on a new task.

Few-Shot Learning

“shot” refers to the number of attempts in the target domain. For example, in 0-shot learning, a policy trained in the source domain works in the target domain.

This typically requires assumptions of similarity between source and target domain.

Taxonomy of Transfer Learning

“forward” transfer

train on one task, transfer to a new task

Randomizing source domain
Fine-tuning

multi-task transfer

train on many tasks, transfer to a new task

generate highly randomized source domains
model-based RL
model distillation
contextual policies
modular policy networks

multi-task meta-learning

learn to learn from many tasks

RNN-based/Gradient-based meta-learning

Forward Transfer

Fine-tuning

Key Idea: Train on the source task, then train some more on the target task, for example, by retraining the weights on the last layer. Lower layers are likely to learn representations from the source task that are useful in the target task. This works well if the source task is broad and diverse.

Fine tuning is popular in the supervised learning setting.

Why fine-tuning doesn’t work well for RL

RL tasks tend to be narrow (not broad and diverse), and features are less general
RL methods tend to learn deterministic policies, the policies that are optimal in the fully-observed MDP.
1. Low-entropy policies adapt very slowly to new settings
2. Little exploration at convergence

To increase diversity and entropy, we can do maximum-entropy learning which acts as randomly as possible while collecting high rewards Haarnoja et al., n.d.:

\begin{equation} \pi(a|s) = \mathrm{exp} (Q_\phi(s,a)-V(s)) \end{equation}

This optimizes

\begin{equation} \sum_t E_{\pi(s_t, a_t)}[r(s_t, a_t)] + E_{\pi(s_t)}[\mathcal{H}(\pi(a_t|s_t))] \end{equation}

Manipulating the Source Domain

This is used where we can design the source domain (e.g. training in a simulator, which can be tweaked). Injecting randomness/diversity in the source tends to be helpful.

Resources:

Domain Randomization for Sim2Real Transfer

Multi-task Transfer

Some things remain constant across tasks, such as the laws of physics. Model-based RL may learn these laws, and transfer this knowledge across multiple tasks.

In model distillation, multiple policies are combined into one, for concurrent multi-task learning. Policy distillation Rusu et al., n.d. is able to:

Compress policies learnt on single games into smaller models
Build agents capable of playing multiple games
Improve the stability of the DQN learning algorithm by distilling online the policy of the best performing agent

<biblio.bib>

Jethro's Braindump