More on Dota 2
Dota 2
Gathering human feedback
Better exploration with parameter noise
Proximal Policy Optimization
Robust adversarial inputs
Hindsight Experience Replay
Teacher–student curriculum learning
Faster physics in Python
Learning from human preferences