An option is defined as a tuple containing:
- An initiation function (precondition)
- An internal policy (behaviour)
- A termination function (post-condition)
This helps put learning and planning algorithms at the same level of abstraction. (Stolle and Precup, n.d.)
Models vs Actions
- models of actions consist of immediate reward and transition probability to next state
- models of options consist of reward until termination, and (discounted) transition to termination state
They look a lot like value functions, and can use the TD error to train the model (Temporal Difference Learning).
Stolle, Martin, and Doina Precup. n.d. “Learning Options in Reinforcement Learning.” In International Symposium on Abstraction, Reformulation, and Approximation, 212–23. Springer.