The course provides an overview of mathematical models and algorithms behind optimal decision making in time-series systems. The course focus is on optimal decision making and control, reinforcement learning, and decision making under uncertainty.
Lecturer: Joni Pajarinen.
Teaching assistants (TAs): Yi Zhao, Aleksi Ikkala, Wenshuai Zhao, Nikita Kostin, Ali Khoshvishkaie, Jifei Deng, Mohammadreza Nakhaei.
- The reinforcement learning lecture will be organized in person this year.
- Location: Maarintie 8, AS1
- Time: Tuesdays 14:15-16:00 (Period I, II)
- Although in person participation is encouraged for the full lecture experience lectures will be also recorded and can be watched afterwards
- Grading Scale: 0-5
- 7 individual assignments (60%)
- 1 project work, in groups (max. 2 students) (20%)
- Quizzes (due before lecture) (20 %)
- Exercise sessions will be given twice a week. Attendance is optional.
- (Remotely) Mondays 12.15–14.00, Zoom Link (links will be given during sessions)
- (In person) Wednesdays 10.15–12.00, Maarintie 8, AS3 Saab Space
- Please join the slack channel to receive the latest updates and ask questions about the exercises. Please use your Aalto account for registering to Slack. Notice that, we will use the slack channel as the main place to answer questions about the exercises.
- Each Student has 3 days in total for late submissions.
|W36||L1 Course Overview||Tue, 6.9||no readings||Ex1 (6.9)||-|
|W37||L2 Markov decision processes||Tue, 13.9||Sutton & Barto, chapters 2-2.3, 2.5-2.6, 3-3.8||Ex2(13.9)||-|
|W38||L3 RL in discrete domains||Tue, 20.9||Sutton & Barto Ch. 5-5.4, 5.6, 6-6.5||Ex3(20.9)||Ex1 (19.9)|
|W39||L4 Function approximation||Tue, 27.9||Sutton & Barto Ch. 9-9.3, 10-10.1||Ex4(27.9)||Ex2(26.9)|
|W40||L5 Policy gradient||Tue, 4.10||Sutton & Barto, Ch. 13-13.3||Ex5(4.10)||Ex3(3.10)|
|W41||L6 Actor-critic||Tue 11.10||Sutton & Barto, Ch. 13.5, 13.7||Ex6(11.10)||Ex4(10.10)|
|W42||No Lecture||Tue, 18.10|
|W43||L7 Model-based RL||Tue, 25.10||Sutton & Barto, Ch. 8 - 8.2||Ex5(24.10)|
|W44||L8 Interleaved learning and planning||Tue, 1.11||Sutton & Barto, Ch. 8 - 8.2||Proj (1.11)|
|W45||L9 Exploration and exploitation||Tue, 8.11||
1) Sutton & Barto, Ch. 2.7, 8.9 - 8.11 and 2) Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 11(1), 1-96. https://web.stanford.edu/~bvr/pubs/TS_Tutorial_FnT.pdf Section 2, 3, 4
|W46||L10 Guest lecture (Aidan Scannell). Model-based reinforcement learning under uncertainty: the importance of knowing what you don't know
|W47||L11 Partially observable MDPs||Tue, 22.11||
1) Anthony Cassandra, POMDP tutorial, http://www.pomdp.org/tutorial/, steps from "Brief Introduction to MDPs" until "Background on POMDPs" and 2) Partially Observable Markov Decision Processes in Robotics: A Survey. https://arxiv.org/pdf/2209.10342 Sections II.A, III.B, III.C
|W48||No Lecture||Tue, 29.11||Project (12.12)|
Who to contact
Usually, if you need help with the exercises or project work, you can put your questions in the corresponding slack channel or attend the exercise session. But if you need to contact TAs in person, here is the list:
If you have other questions (such as illness or military service, etc), you can directly contact Prof. Joni Pajarinen.