• Important note for 2020

    The reinforcement learning course will be organized remotely/on-line entirely. The remote teaching events (lectures, TA sessions, etc.) will be organized according to the schedule announced for the course. We will use primarily Zoom and Slack for the interaction, with some extra tools still being under consideration.


    The course provides an overview of mathematical models and algorithms behind optimal decision making in time-series systems. The course focus is in optimal decision making and control, reinforcement learning, and decision making under uncertainty.

    Please join the slack channel to receive the latest updates and ask questions about the exercises. 

    Those who are active on Slack helping their peers can receive an extra 10% on top of the grade of each exercise. When joining the slack channel, please use your full name so that the grades are assigned correctly.

    Please read carefully the Setting Things Up and Submission Instructions document, also available on the Assignments page. The grading of the assignments will be done according to the document rules.

  • Concept

    Lectures will be places of discussion where the current topic is summarized by the lecturer and discussed among all present. The students are expected to prepare by reading given material in advance prior to each lecture.


    Course lectures will be given during first and second periods on Tuesdays 14:15-16:00. All lectures will be given over Zoom at link https://aalto.zoom.us/j/63645678644. Please download and install Zoom before the first lecture to attend the course. Lecture recordings will likely be available afterwards, but this cannot be guaranteed due to potential technical issues. The lectures are interactive in nature so that participation is encouraged.

    Schedule and Readings

    For each lecture starting from the third one, there will be reading materials that the students should study before attending the lecture.

    Course arrangements, Overview, Tue 8.9., no readings

    Markov decision processes, Tue 15.9., Sutton & Barto, chapters 2-2.3, 2.5-2.6, 3-3.8

    RL in discrete domains (value-based RL), Tue 22.9., Sutton&Barto Ch. 5-5.4, 5.6, 6-6.5

    Function approximation, Tue 29.9., Sutton&Barto Ch. 9-9.3, 10-10.1

    Policy gradient, Tue 6.10.,  Sutton&Barto, Ch. 13-13.3

    Actor-critic, Tue 13.10., Sutton & Barto, Ch. 13.5, 13.7

    Towards model-based reinforcement learning: optimal control, Tue 20.10. Platt: Introduction to Linear Quadratic Regulation

    Model-based reinforcement learning, Tue 27.10., Sutton & Barto, Ch. 8-8.2

    Guest lecture: Safety and constraints, Tue 3.11.

    Partially observable MDPs, Tue 10.11., Anthony Cassandra, POMDP tutorial, http://www.pomdp.org/tutorial/, steps from "Brief Introduction to MDPs" until " General Form of a POMDP solution".

    Large POMDPs, Tue 17.11.

    Project show!, Tue 8.12. (to be confirmed)

  • Rules and arrangements

    The course will have six compulsory individual assignments making up 50% of the final grade. The assignments will be introduced in the exercise sessions. Instructions will appear on this page. Assignments are supervised by TAs.

    Each assignment will be graded and the assignments constitute towards the course grade.

    Keep in mind that the assignments and quizzes are to be completed individually by each student. While it is perfectly fine (and even encouraged!) to discuss the algorithms, implementations and the concepts taught in the course with your peers, directly sharing answers, data or code will not be accepted. In short—share ideas, not answers.

    Remember to submit all your solutions on time, and double check that your submission contains all the necessary files, as listed at the end of the assignment instruction document. We do not accept any submissions or additional files after the submission system in MyCourses closes. The only exceptions are in well justified cases such as illness (supported by a proper certificate) or military service. If you cannot submit the assignment on time due to university-related reasons, such as attending a conference, please inform the course staff in advance.

    Exercise Sessions

    There are three exercise sessions per week. The hours and zoom links for each session are:

    H02 Exercises - Monday, 12:15-14:00 -  https://aalto.zoom.us/j/66005484848

    H03 Exercises - Tuesday, 12:15-14:00 -  https://aalto.zoom.us/j/63575315747

    H01 Exercises - Wednesday, 10:15-12:00 -  https://aalto.zoom.us/j/66492815789


    The quizzes are individual works and should be completed independently. The answers can be found in the lectures and readings. The quizzes will make up 20% of the final grade.


    The hard-deadlines for each assignments is listed below. There are two weeks (now shifted to three weeks) from when the assignment is uploaded until the deadline on Monday before the exercise session.

    • Exercise 1 - 21.9.2020 

    • Exercise 2 - 28.9.2020

    • Exercise 3 - 12.10.2020

    • Exercise 4 - 19.10.2020

    • Exercise 5 - 26.10.2020

    • Exercise 6 - 02.11.2020

  • Overview

    The course has a final project to apply the knowledge gathered throughout the course to a specific problem.

    Course Project

    The course project topic is to implement and train a Reinforcement Learning agent to play Pong.

    WimblePong Semifinal 1 WimblePong Semifinal 2

    The course project is done by pairs. The instructions of the project will be released by week 43 (19.10-25.10).

    Alternative Project

    Alternatively, students can also propose their own project topic. This option is mainly aimed at PhD students that want to apply Reinforcement Learning to their own field, but Master's students are also encouraged. 

    Alternative project topics are individual. The topic will be submitted by week 43 (19.10-25.10). The project will be evaluated by the staff of the course, and can be started once the project is approved. 


    The course project grade accounts for 30% of the final grade of the course.