Reinforcement Learning Using Neural Networks, with Applications to Motor Control

PhD thesis, by Rémi Coulom


This thesis is a study of practical methods to estimate value functions with feedforward neural networks in model-based reinforcement learning. Focus is placed on problems in continuous time and space, such as motor-control tasks. In this work, the continuous TD(lambda) algorithm is refined to handle situations with discontinuous states and controls, and the vario-eta algorithm is proposed as a simple but efficient method to perform gradient descent. The main contributions of this thesis are experimental successes that clearly indicate the potential of feedforward neural networks to estimate high-dimensional value functions. Linear function approximators have been often preferred in reinforcement learning, but their success is restricted to relatively simple mechanical systems, or require a lot of prior knowledge. The method presented in this thesis was tested successfully on an original task of learning to swim by a simulated articulated robot, with 4 control variables and 12 independent state variables.


(only the first pages are in French, the rest is in English):


For those who cannot run the win32 demos below, some avi movies demonstrating the movements of swimmers (DivX codec required):

A few interactive (win32) swimmer demos (click in the window to change swimming direction):

Source code of the swimmer simulator:

RARS demo:


 author = "R\'emi Coulom",
 title = "Reinforcement Learning Using Neural Networks, with Applications
          to Motor Control",
 school = "Institut National Polytechnique de Grenoble",
 year = 2002