Topic outline

  • The course provides an overview of mathematical models and algorithms behind optimal decision making in time-series systems. The course focus is on optimal decision making and control, reinforcement learning, and decision making under uncertainty.

    Practical matters

    Lecturer: Joni Pajarinen.

    Teaching assistants (TAs):  Aidan Scannell, Vivienne Wang, Wenyan Yang, Mohammadreza Nakhaei, Yuying Zhang, Wenshuai Zhao, Nikita Kostin, Yi Zhao, Taha Heidari

    • The reinforcement learning lecture will be organized as follows
      • Location: Maarintie 8, AS1
      • Time: Tuesdays 14:15-16:00 (Period I, II). Note!: The first lecture is on Monday 4.9.2023 at 12:15 - 14:00 in room T1 (Computer Science building)
      • Although in person participation is encouraged for the full lecture experience lectures will be also recorded and can be watched afterwards
    • Grading Scale: 0-5
      • individual assignments (60%)
      • 1 project work, in groups (max. 2 students) (20%)
      • Quizzes (due before lecture) (20 %)
    • Exercise sessions will be given twice a week. Attendance is optional.
      • Mondays 12:15–14:00, 11.9-20.11.2023,  Maarintie 8, TU3, 
      • Wednesdays 10:15–12:00, 6.9.–29.11.2023, Maarintie 8, AS3 Saab Space
    • Please join the Zulip link to receive the latest updates and ask questions about the exercises.  Please use your Aalto account for registering to Zulip. Notice that, we will use the Zulip channel as the main place to answer questions about the exercises.
    • Each Student has 3 days in total for late submissions.


    Lecture Schedule

     Week  Lecture Lecture_Date  Reading
    W36 L1 Course Overview  Mon, 4.9no readings
    W37 L2 Markov decision processes  Tue, 12.9Sutton & Barto, chapters 2-2.3, 2.5-2.6, 3-3.8
    W38 L3 RL in discrete domains  Tue, 19.9Sutton & Barto Ch. 5-5.4, 5.6, 6-6.5
    W39 L4 Function approximation  Tue, 26.9Sutton & Barto Ch. 9-9.3, 10-10.1
    W40 L5 Policy gradient  Tue, 3.10Sutton & Barto, Ch. 13-13.3
    W41 L6 Actor-critic  Tue 10.10Sutton & Barto, Ch. 13.5, 13.7
    W42 No Lecture  Tue 17.10
    W43 L7 Model-based RL  Tue 24.10Sutton & Barto, Ch. 8 - 8.2
    W44 L8 Interleaved learning and planning  Tue 31.10Sutton & Barto, Ch. 8 - 8.2
    W45 L9 Exploration and exploitation  Tue 7.111) Sutton & Barto, Ch. 2.7, 8.9 - 8.11 and 2) Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 11(1), 1-96. Section 2, 3, 4
    W46 L10 Guest lecture (TBD)  Tue 14.11
    W47 L11 Partially observable MDPs  Tue 21.111) Anthony Cassandra, POMDP tutorial,, steps from "Brief Introduction to MDPs" until "Background on POMDPs" and 2) Partially Observable Markov Decision Processes in Robotics: A Survey Sections II.A, III.B, III.C
    W48 No Lecture  Tue 28.11

    Quiz Schedule

    Release       Deadline (always before the lecture)
    Quiz 1 Sep 5 Sep 12
    Quiz 2 Sep 12 Sep 19
    Quiz 3 Sep 19
    Sep 26
    Quiz 4 Sep 26 Oct 3
    Quiz 5 Oct 3
    Oct 10
    Quiz 6 Oct 10 Oct 24
    Quiz 7 Oct 24
    Nov 7
    Quiz 8
    Nov 7Nov 21

    Exercise & Project Schedule

    Exercises & Project       Release       Deadline
    Exercise 1 Sep 5 Sep 18 @23:59
    Exercise 2 Sep 13 Sep 25 @23:59
    Exercise 3 Sep 20 Oct 2 @23:59
    Exercise 4 Sep 27 Oct 9 @23:59
    Exercise 5 Oct 4 Oct 23 @23:59
    Exercise 6 Oct 11 Nov 6 @23:59
    Exercise 7 Oct 25 Nov 20 @23:59
    Project Oct 18 Dec 4 @23:59

    Who to contact

    Usually, if you need help with the exercises or project work, you can put your questions in the corresponding Zulip channel or attend the exercise session. But if you need to contact TAs in person, here is the list:

    Ex/Proj    TAs
    Ex1Wenyan, Mohammadreza
    Ex2Wenshuai, Yi
    Ex3Nikita, Vivienne
    Ex4Wenyan, Yuying
    Ex5Nikita, Vivienne
    Ex6Wenshuai, Yuying
    Ex7Mohammadreza, Nikita
    ProjTaha, Wenshuai, Wenyan

    If you have other questions (such as military service, etc), you can directly contact Prof. Joni Pajarinen.

  • Rules and arrangements

    The course will have seven compulsory individual assignments making up 60% of the final grade. Instructions and materials will appear on this page.

    Each assignment will be graded and the assignments constitute the course grade.

    Keep in mind that the assignments and quizzes are to be completed individually by each student. While it is perfectly fine to discuss the algorithms, implementations and concepts taught in the course with your peers, directly sharing answers, data or code will not be accepted. In short—share ideas, not answers.

    Remember to submit all your solutions on time, and double-check that your submission contains all the necessary files, as listed at the end of the assignment instruction document.

    Each student has 3 days in total for late submissions. It counts at least 1 day once past the deadline. We do not accept any submissions or additional files beyond it. The only exceptions are in well justified cases such as illness (supported by a proper certificate) or military service. If you cannot submit the assignment on time due to university-related reasons, such as attending a conference, please inform the course staff in advance.


    The quizzes are individual works and should be completed independently. The answers can be found in the lectures and readings. The quizzes will make up 20% of the final grade.