Jethro's Braindump

Free-Energy Reinforcement Learning

Reinforcement Learning ⭐

This is a framework proposed by Sallans and Hinton in 2004 (Sallans and Hinton 2004). The key insight is that a product of experts allows for model parameters to be learnt efficiently, because values and derivatives for the product of experts can be efficiently computed.

Figure 1: Free-energy RL

Figure 1: Free-energy RL

The weights of the RBM are tweaked such that the free energy of a network configuration equals to the reward signal for the given state-action pair.

An action is selected by performing Gibbs sampling, holding the state variables fixed. The action with the lowest free energy is produced, corresponding to the highest expected reward for the given state.

Spiking neural networks can be used to implement RBMs, hence used for FERL (Nakano and Otsuka 2011).


Nakano, Takashi, and Makoto Otsuka. 2011. “Spiking Neural Network Model of Free-Energy-Based Reinforcement Learning.” BMC Neuroscience 12 (S1):P244.

Sallans, Brian, and Geoffrey Hinton. 2004. “Reinforcement Learning with Factored States and Actions.” Journal of Machine Learning Research 5 (August):1063–88.