Better exploration with parameter noise
Proximal Policy Optimization
Robust adversarial inputs
Hindsight Experience Replay
Teacher–student curriculum learning
Faster physics in Python
Learning from human preferences
Learning to cooperate, compete, and communicate
UCB exploration via Q-ensembles
OpenAI Baselines: DQN