Hindsight Experience Replay
Hindsight Experience Replay
Key Challenges
It is challenging for agents to learn in environments where the rewards are sparse. It is desirable to design algorithms where manual reward engineering is not required.
Key Insight
Consider a state sequence
This modification results in at least half of the replayed
trajectories containing meaningful rewards, and makes learning
possible. Using Universal Value Function Approximators, which are
policies and value functions that take as input state
HER as Implicit Curriculum
Goals used for replay naturally shift from simple to achieve goals achievable by a random agent, to more difficult ones. HER has no explicit control over the distribution of initial environment states.