Does Reinforcement Learning and optimal control provide a reasonable model of human behaviour? Is there a better explanation?
For example, is the gait of a human being optimizing for a certain objective?
If we assume that the policy maximizes the expectation of total reward under some given dynamics, can we learn the human reward function from the data?
We can model sub-optimal behaviour using techniques from graphical models (§control_as_inference), and use this framework to derive new “soft” RL algorithms (keywords: soft optimality).