Temporal Difference Learning

Observe samples \(\left(s_t, a_t, r_t, s_{t+1} \right)\). If value estimates are accurate, the following must hold:

\begin{equation} V(s_t) = r_t + \gamma V(s_{t+1}) \end{equation}

If not, there is a TD error:

\begin{equation} \gamma = r_t + \gamma V(s_{t+1}) - V(s_t) \end{equation}

To learn better estimates - minimize $γ $ TD(0):

\begin{equation} V(s) \leftarrow V(s) + \alpha \left( r_t + \gamma V(s_{t+1}) - V(s_t) \right) \end{equation}

