Course: ELEC-E8125 - Reinforcement learning D, Lecture, 5.9.2022-30.11.2022

Topic outline

General

Overview

The course provides an overview of mathematical models and algorithms behind optimal decision making in time-series systems. The course focus is on optimal decision making and control, reinforcement learning, and decision making under uncertainty.

Practical matters

Lecturer: Joni Pajarinen.

Teaching assistants (TAs): Yi Zhao, Aleksi Ikkala, Wenshuai Zhao, Nikita Kostin, Ali Khoshvishkaie, Jifei Deng, Mohammadreza Nakhaei.

The reinforcement learning lecture will be organized in person this year.
- Location: Maarintie 8, AS1
- Time: Tuesdays 14:15-16:00 (Period I, II)
- Although in person participation is encouraged for the full lecture experience lectures will be also recorded and can be watched afterwards
Grading Scale: 0-5
- 7 individual assignments (60%)
- 1 project work, in groups (max. 2 students) (20%)
- Quizzes (due before lecture) (20 %)
Exercise sessions will be given twice a week. Attendance is optional.
- (Remotely) Mondays 12.15–14.00, Zoom Link (links will be given during sessions)
- (In person) Wednesdays 10.15–12.00, Maarintie 8, AS3 Saab Space
Please join the slack channel to receive the latest updates and ask questions about the exercises. Please use your Aalto account for registering to Slack. Notice that, we will use the slack channel as the main place to answer questions about the exercises.
Each Student has 3 days in total for late submissions.

Schedule

Week	Lecture	Lecture Date	Reading	Events	Deadline
W36	L1 Course Overview	Tue, 6.9	no readings	Ex1 (6.9)	-
W37	L2 Markov decision processes	Tue, 13.9	Sutton & Barto, chapters 2-2.3, 2.5-2.6, 3-3.8	Ex2(13.9)	-
W38	L3 RL in discrete domains	Tue, 20.9	Sutton & Barto Ch. 5-5.4, 5.6, 6-6.5	Ex3(20.9)	Ex1 (19.9)
W39	L4 Function approximation	Tue, 27.9	Sutton & Barto Ch. 9-9.3, 10-10.1	Ex4(27.9)	Ex2(26.9)
W40	L5 Policy gradient	Tue, 4.10	Sutton & Barto, Ch. 13-13.3	Ex5(4.10)	Ex3(3.10)
W41	L6 Actor-critic	Tue 11.10	Sutton & Barto, Ch. 13.5, 13.7	Ex6(11.10)	Ex4(10.10)
W42	No Lecture	Tue, 18.10
W43	L7 Model-based RL	Tue, 25.10	Sutton & Barto, Ch. 8 - 8.2		Ex5(24.10)
W44	L8 Interleaved learning and planning	Tue, 1.11	Sutton & Barto, Ch. 8 - 8.2	Proj (1.11)
W45	L9 Exploration and exploitation	Tue, 8.11	1) Sutton & Barto, Ch. 2.7, 8.9 - 8.11 and 2) Russo, D. J., Van Roy, B., Kazerouni, A., Osband, I., & Wen, Z. (2018). A tutorial on Thompson sampling. Foundations and Trends in Machine Learning, 11(1), 1-96. https://web.stanford.edu/~bvr/pubs/TS_Tutorial_FnT.pdf Section 2, 3, 4	Ex7(8.11)	Ex6(7.11)
W46	L10 Guest lecture (Aidan Scannell). Model-based reinforcement learning under uncertainty: the importance of knowing what you don't know	Tue, 15.11
W47	L11 Partially observable MDPs	Tue, 22.11	1) Anthony Cassandra, POMDP tutorial, http://www.pomdp.org/tutorial/, steps from "Brief Introduction to MDPs" until "Background on POMDPs" and 2) Partially Observable Markov Decision Processes in Robotics: A Survey. https://arxiv.org/pdf/2209.10342 Sections II.A, III.B, III.C		Ex7(21.11)
W48	No Lecture	Tue, 29.11			Project (12.12)

Who to contact

Usually, if you need help with the exercises or project work, you can put your questions in the corresponding slack channel or attend the exercise session. But if you need to contact TAs in person, here is the list:

Ex/Proj	TAs
Ex1	Aleksi, Yi
Ex2	Jifei, Wenshuai
Ex3	Ali, Nikita
Ex4	Jifei, Yi
Ex5	Ali, Aleksi
Ex6	Mohammadreza, Wenshuai
Ex7	Mohammadreza, Yi
Proj	Nikita

If you have other questions (such as illness or military service, etc), you can directly contact Prof. Joni Pajarinen.

Select activity Announcements

Announcements Forum

ELEC-E8125 - Reinforcement learning D, Lecture, 5.9.2022-30.11.2022