# Transfer Learning

tags
Reinforcement Learning ⭐

Prior understanding of problem structure can help us solve complex tasks quickly. Perhaps solving prior tasks would help acquire useful knowledge for solving a new task.

Transfer learning is the use of experience from one set of tasks for faster learning and better performance on a new task.

## Few-Shot Learning

“shot” refers to the number of attempts in the target domain. For example, in 0-shot learning, a policy trained in the source domain works in the target domain.

This typically requires assumptions of similarity between source and target domain.

## Taxonomy of Transfer Learning

“forward” transfer
• Randomizing source domain
• Fine-tuning
• generate highly randomized source domains
• model-based RL
• model distillation
• contextual policies
• modular policy networks
learn to learn from many tasks

## Forward Transfer

### Fine-tuning

Key Idea: Train on the source task, then train some more on the target task, for example, by retraining the weights on the last layer. Lower layers are likely to learn representations from the source task that are useful in the target task. This works well if the source task is broad and diverse.

Fine tuning is popular in the supervised learning setting.

• Why fine-tuning doesn’t work well for RL
1.  RL tasks tend to be narrow (not broad and diverse), and features
are less general
2.  RL methods tend to learn deterministic policies, the policies that
are optimal in the fully-observed MDP.
1.  Low-entropy policies adapt very slowly to new settings
2.  Little exploration at convergence

To increase diversity and entropy, we can do maximum-entropy learning
which acts as randomly as possible while collecting high rewards <a id="1b37e467d7dc76e365875dfb5c03fa1e" href="#pmlr-v70-haarnoja17a">(Tuomas Haarnoja et al., 2017)</a>:

$$\pi(a|s) = \mathrm{exp} (Q\_\phi(s,a)-V(s))$$

This optimizes

$$\sum\_t E\_{\pi(s\_t, a\_t)}[r(s\_t, a\_t)] + E\_{\pi(s\_t)}[\mathcal{H}(\pi(a\_t|s\_t))]$$


### Manipulating the Source Domain

This is used where we can design the source domain (e.g. training in a simulator, which can be tweaked). Injecting randomness/diversity in the source tends to be helpful.

Resources: