Paper: https://arxiv.org/abs/1804.02464

## Goal

To build networks that are plastic: quick and efficient learning from experience, inspired by synaptic plasticity. This is to bridge the gap with biological agents, which are able to learn quickly from prior experience, mastering environments with changing features.

An alternative to §meta_learning, synaptic plasticity strengthens and weakens connections between neurons based on neural activity: whether they fire together.

Plasticity has traditionally been explored with evolutionary
algorithms, *differential plasticity* allows for learning such
plasticity updates via backpropagation.

## Key Idea

We include an additional plastic component for each neuron. The fixed
part contains regular neuronal weights \(w_{i,j}\), while the plastic
part is stored in a *Hebbian trace* \(\mathrm{Hebb}_{i,j}\).

\begin{equation} x_{i,j} = \sigma \left\{ \sum_{i \in inputs} w_{i,j} x_i (t-1) + \alpha_{i,j} \mathrm{Hebb}_{i,j}(t)x_{i}(t-1) \right\} \end{equation}

where \(\alpha\) is the plasticity coefficient, governing how much of the weight is from the plastic component, and \(\sigma\) is some non-linearity.

The Hebbian trace is updated based on Hebbian dynamics:

\begin{equation} \mathrm{Hebb}_{i,j}(t+1) = \eta x_i(t-1)x_j(t) + (1 - \eta) \mathrm{Hebb}_{i,j}(t) \end{equation}

The Hebbian trace is initialized to zero, at the beginning of each episode, and is purely a lifetime quantity.

\(\eta\) is a weight decay term, to prevent runaway positive feedback on Hebbian traces. In the absence of input, the Hebbian trace decays to zero. One can use Oja’s rule to prevent such runaway divergences, while maintaining stable long-term memories in absence of stimulation.