Observe samples (st,at,rt,st+1). If value estimates are accurate, the following must hold:
V(st)=rt+γV(st+1)
If not, there is a TD error:
γ=rt+γV(st+1)−V(st)
To learn better estimates - minimize $γ $ TD(0):
V(s)←V(s)+α(rt+γV(st+1)−V(st))