Jethro's Braindump

Options Framework

An option is defined as a tuple containing:

  1. An initiation function (precondition)
  2. An internal policy (behaviour)
  3. A termination function (post-condition)

This helps put learning and planning algorithms at the same level of abstraction. (Stolle & Precup, 2002)

Models vs Actions

  • models of actions consist of immediate reward and transition probability to next state
  • models of options consist of reward until termination, and (discounted) transition to termination state

They look a lot like value functions, and can use the TD error to train the model §td_learning.


Stolle, M., & Precup, D., Learning options in reinforcement learning, In , International Symposium on abstraction, reformulation, and approximation (pp. 212–223) (2002). : .

Icon by Laymik from The Noun Project. Website built with ♥ with Org-mode, Hugo, and Netlify.