How can we make decisions if we know the dynamics of the environment?

## Stochastic optimization

Stochastic optimization for open-loop planning:

We wish to choose \(a_1, \dots a_T = \mathrm{argmax}_{a_1, \dots a_T} J(a_1, \dots, a_T)\) for some objective \(J\).

### Guess and Check

An extremely simple method, that’s parallelizable:

- pick \(A_1, \dots A_N\) from some distribution
- choose \(A_i\) based on \(\mathrm{argmax} J(A_i)\).

### Cross-entropy Method (CEM)

- pick \(A_1, \dots A_N\) from some initial distribution \(p(A)\)
- Evaluate \(J(A_1), \dots J(A_N)\)
- pick the elites \(A_{i1}, \dots A_{im}\) with the highest value
- fit distribution $P(A) to the elites

With continuous inputs, a multi-variate normal distribution is a common choice for \(p(A)\). In the discrete Case, Monte-Carlo tree search (§mcts) is typically used.

## Using Derivatives

- Differentiable Dynamic Programming (DDP)
- LQR