Learning from human preferences
Learning to cooperate, compete, and communicate
UCB exploration via Q-ensembles
OpenAI Baselines: DQN
Robots that learn
Roboschool
Equivalence between policy gradients and soft Q-learning
Stochastic Neural Networks for hierarchical reinforcement learning
Unsupervised sentiment neuron
Spam detection in the physical world