Topic outline

  • Important note for 2021

    The reinforcement learning course will be organized remotely/on-line at least mostly. The remote teaching events (lectures, TA sessions, etc.) will be organized according to the schedule announced for the course. We will use primarily Zoom and Slack for the interaction, with some extra tools still being under consideration.


    The course provides an overview of mathematical models and algorithms behind optimal decision making in time-series systems. The course focus is in optimal decision making and control, reinforcement learning, and decision making under uncertainty.

    Please join the slack channel to receive the latest updates and ask questions about the exercises.  Please use your Aalto account for registering to Slack.

    Those who are active on Slack helping their peers can receive an extra 10% on top of the grade of each exercise. When joining the slack channel, please use your full name so that the grades are assigned correctly.

    Please read carefully the Setting Things Up and Submission Instructions document, also available on the Assignments page. The grading of the assignments will be done according to the document rules.

  • Concept

    Lectures will be places of discussion where the current topic is summarized by the lecturer and discussed among all present. The students are expected to prepare by reading given material in advance prior to each lecture.


    Course lectures will be given during first and second periods on Tuesdays 14:15-16:00. All lectures will be given over Zoom at link Please download and install Zoom before the first lecture to attend the course. Lecture recordings will likely be available afterwards, but this cannot be guaranteed due to potential technical issues. The lectures are interactive in nature so that participation is encouraged.

    Course lectures will be given by Ville Kyrki (first part) and Joni Pajarinen (second part).

    Schedule and Readings

    For each lecture starting from the third one, there will be reading materials that the students should study before attending the lecture.

    Course arrangements, Overview, Tue 14.9., no readings

    Markov decision processes, Tue 21.9., Sutton & Barto, chapters 2-2.3, 2.5-2.6, 3-3.8

    RL in discrete domains (value-based RL), Tue 28.9., Sutton&Barto Ch. 5-5.4, 5.6, 6-6.5

    Function approximation, Tue 5.10., Sutton&Barto Ch. 9-9.3, 10-10.1

    Policy gradient, Tue 12.10.,  Sutton&Barto, Ch. 13-13.3

    Actor-critic, Tue 19.10., Sutton & Barto, Ch. 13.5, 13.7

    Towards model-based reinforcement learning: optimal control, Tue 26.10. Platt: Introduction to Linear Quadratic Regulation

    Model-based reinforcement learning, Tue 2.11., Sutton & Barto, Ch. 8-8.2

    Guest lectures Tue 9.11.: Safety and constraints (Gökhan Alcal), Entropy Regularization in Reinforcement Learning (Riad Akrour)

    Partially observable MDPs, Tue 16.11., Anthony Cassandra, POMDP tutorial,, steps from "Brief Introduction to MDPs" until " General Form of a POMDP solution".

    Large POMDPs, Tue 23.11.

    Project show!, to be confirmed

  • Rules and arrangements

    The course will have six compulsory individual assignments making up 50% of the final grade. The assignments will be introduced in the exercise sessions. Instructions and materials will appear on this page.

    Each assignment will be graded and the assignments constitute towards the course grade.

    Keep in mind that the assignments and quizzes are to be completed individually by each student. While it is perfectly fine to discuss the algorithms, implementations and the concepts taught in the course with your peers, directly sharing answers, data or code will not be accepted. In short—share ideas, not answers.

    Remember to submit all your solutions on time, and double check that your submission contains all the necessary files, as listed at the end of the assignment instruction document. We do not accept any submissions or additional files after the submission system in MyCourses closes. The only exceptions are in well justified cases such as illness (supported by a proper certificate) or military service. If you cannot submit the assignment on time due to university-related reasons, such as attending a conference, please inform the course staff in advance.

    Exercise Sessions

    There are three exercise sessions per week in which you can ask questions about the lectures and exercises. Attendance is optional. The sessions take place online:

    H02 Exercises - Monday 12:15-14:00

    H03 Exercises - Tuesday 12:15-14:00

    H01 Exercises - Wednesday 10:15-12:00


    The quizzes are individual works and should be completed independently. The answers can be found in the lectures and readings. The quizzes will make up 20% of the final grade.


    The hard-deadlines for each assignments is listed below. There are three weeks from when the assignment is uploaded until the deadline.

    • Exercise 1 - Release: 13.09.2021 - Deadline 04.10.2021

    • Exercise 2 - Release: 20.09.2021 - Deadline 11.10.2021

    • Exercise 3 - Release: 27.09.2021 - Deadline 18.10.2021

    • Exercise 4 - Release: 04.10.2021 - Deadline 25.10.2021

    • Exercise 5 - Release: 11.10.2021 - Deadline 01.11.2021

    • Exercise 6 - Release: 18.10.2021 - Deadline 08.11.2021

  • Overview

    The course has a final project to apply the knowledge gathered throughout the course to a specific problem.

    Course Project

    In the project work, we will implement and apply some more advanced RL algorithms in continuous control tasks. The project work includes two parts. First, two vastly used reinforcement learning algorithms, TD3 ( and PPO (, will be implemented. For this part, we will offer the base code so you can start easily. After finishing this part, you can train a policy to balance a cart pole and to control an halfcheetah running forward. 

    In the second part, you need to read some research papers and implement their proposed algorithms based on the code finished in Part I. The candidate algorithms in Part II include
    According to your preference, you can choose one of them to understand the paper and to implement the algorithm. For the listed algorithms, we will offer you the reference training curve. If you are interested in other algorithms, you can also choose them in Part II, but we can not offer much help in implementing those algorithms. 

    This project work is supposed to be done in groups of 2 students. If you need to find a partner for the project, please join the project channel on Slack and advertise yourself. The deadline for both the course project and the alternative project is 05.12.2021 at 23:55. (edited) 

    Alternative Project

    Alternatively, students can also propose their own project topic. This option is mainly aimed at PhD students that want to apply Reinforcement Learning to their own field, but Master's students are also encouraged. 

    Alternative project topics are individualThe project proposal needs to be submitted and will be evaluated by the staff of the course, and can be started once the project is approved. 

    The deadline for the alternative course project proposal is 29.10.2021 at 23:55.


    The course project grade accounts for 30% of the final grade of the course.