Temporal Difference Learning
Observe samples \(\left(s_t, a_t, r_t, s_{t+1} \right)\). If value estimates are accurate, the following must hold:
\begin{equation} V(s_t) = r_t + \gamma V(s_{t+1}) \end{equation}
If not, there is a TD error:
\begin{equation} \gamma = r_t + \gamma V(s_{t+1}) - V(s_t) \end{equation}
To learn better estimates - minimize $γ $ TD(0):
\begin{equation} V(s) \leftarrow V(s) + \alpha \left( r_t + \gamma V(s_{t+1}) - V(s_t) \right) \end{equation}