Summary of "Playing Atari with Deep Reinforcement Learning"
Recent breakthroughs in computer vision and speech recognition motivated applying a similar deep learning approach to reinforcement learning, connecting a reinforcement learning algorithm to a deep neural network that operates directly on raw inputs like images. The proposed Deep Q-Network (DQN) algorithm uses experience replay, where the agent's experiences (state, action, reward, next state) are stored in a replay memory of 1 million recent frames. During training, random samples are drawn from the replay memory and used to perform Q-learning updates (minibatch updates of size 32) on the neural network weights using the RMSProp algorithm. The Q-function is approximated by a deep convolutional neural network that takes the current state (preprocessed images) as input. The input is an 84x84x4 image representing the last 4 preprocessed frames. The network architecture consists of: 1) A convolutional layer with 16 8x8 filters and stride 4, followed by a rectifier nonlinearity. 2) A convolutional layer with 32 4x4 filters and stride 2, followed by a rectifier nonlinearity. 3) A fully-connected hidden layer with 256 rectifier units. 4) An output layer with a linear unit for each valid action (between 4 and 18 actions depending on the game). During training, the agent selects actions using an epsilon-greedy policy based on the current Q-network, with epsilon annealed from 1 to 0.1 over the first million frames and then fixed at 0.1. After taking an action and observing the next state and reward, the experience is stored in the replay memory. Positive rewards were set to 1, negative rewards to -1, and 0 rewards unchanged, limiting the error scale and allowing the use of the same learning rate across games, although this may affect performance since the agent cannot differentiate reward magnitudes. A frame-skipping technique was used, with k=4 for most games and k=3 for Space Invaders to make lasers visible. The researchers performed experiments on 7 Atari games (Beam Rider, Breakout, Enduro, Pong, Q*bert, Seaquest, Space Invaders) using the same neural network architecture, learning algorithm, and settings across all games, demonstrating the approach's generality. The DQN outperformed previous methods on all 7 games, achieving better performance than expert humans on Breakout, Enduro, and Pong, close to human performance on Beam Rider, but far from human performance on Q*bert, Seaquest, and Space Invaders, which likely require long time-scale strategies.
9 likes