Proximal policy optimization
| Part of a series on | 
| Machine learning and data mining | 
|---|
Proximal policy optimization (PPO) is a reinforcement learning (RL) algorithm for training an intelligent agent. Specifically, it is a policy gradient method, often used for deep RL when the policy network is very large.