Does Reinforcement Learning and optimal control provide a reasonable
model of human behaviour? Is there a better explanation?

For example, is the gait of a human being optimizing for a certain
objective?

If we assume that the policy maximizes the expectation of total
reward under some given dynamics, can we learn the human reward
function from the data?

We can model sub-optimal behaviour using techniques from graphical
models (§control_as_inference), and use this framework to derive new
“soft” RL algorithms (keywords: soft optimality).

## Resources