Prof G Csanyi
Dr Richard Turner
Dr José Miguel Hernández-Lobato
Timing and Structure
Fridays 11-1pm and Tuesdays 9-11am plus afternoons
Part I computing; Either of 3F3 or 3F8
The aims of the course are to:
- expose students to machine learning approaches to non-linear regression and model-based reinforcement learning
- to gain practical experience necessary to use these techniques successfully (e.g the use of training and test sets for evaluation, automatic differentiation for optimisation etc.)
- to understand the robustness of these approaches to challenging real world phenomena including noise and non-linearities
Note: This is a new project this year; some of the details below may evolve as the project content is developed further during the Michaelmas and Lent Terms.
In this project, students will consider the inverted pendulum system receiving a software simulator of a cart with a pendulum attached written in Python.
The goal will be to learn a controller that balances the pendulum in a data-driven way. The students will initially learn how to operate the simulator and explore the different types of behaviour that the system can exhibit. Next, they will collect training data from the simulator and use this to train non-linear regression models, including linear regression with non-linear basis functions. The trained models will be assessed on test data from the simulator. Once accurate models are learned these will be used to learn controllers that can balance the pendulum in the upright position and keep it there. Data-efficient model-based reinforcement learning techniques will be used for this stage. Finally, the controllers and the models will will be stress tested in various ways to test their robustness.
Students work individually for this project.
Explore the cart-pendulum system using the simulator. Understand the state space and the governing differential equations.
Gather training and test data from the simulator for building models of the system and validating them. Fit various models and assess their quality.
Define a function that maps from the system's state to control actions (the "policy"), optimise the policy to keep pendulum upright.
Stress-test control and learning systems in various ways.
|4pm Sunday 17 May 2020
|4pm Thursday 4 June 2020
Please refer to Form & conduct of the examinations.