Please note! Course description is confirmed for two academic years (1.8.2018-31.7.2020), which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.


After completing the course, a student can: (I) explain main concepts and approaches related to decision making and learning in stochastic time series systems; (ii) read scientific literature to follow the developing field; (iii) implement algorithms such as value iteration and policy gradient.

Credits: 5

Schedule: 07.09.2020 - 02.12.2020

Teacher in charge (valid 01.08.2020-31.07.2022): Ville Kyrki

Teacher in charge (applies in this implementation): Ville Kyrki

Contact information for the course (valid 24.08.2020-21.12.2112):

Lecturer (for course registration, lectures, etc.): Ville Kyrki (, or after lectures.

TAs (for assignments, project): Karol Arndt, David Blanco Mulero, Oliver Struckmeier. Preferably via Slack.

CEFR level (applies in this implementation):

Language of instruction and studies (valid 01.08.2020-31.07.2022):

Teaching language: English

Languages of study attainment: English


  • Valid 01.08.2020-31.07.2022:

    Modeling uncertainty. Markov decision processes. Model-based reinforcement learning. Model-free reinforcement learning. Function approximation. Policy gradient. Partially observable Markov decision processes.

Assessment Methods and Criteria
  • Valid 01.08.2020-31.07.2022:

    Assignments and project work.

  • Applies in this implementation:

    Grading 0-5. Quizzes 20 %, Assignments 50 %, Project 30 %. No exam.

    To pass: Completed assignments. Completed project.

  • Valid 01.08.2020-31.07.2022:

    Contact teaching, independent study, assignments, project

    Contact teaching 56 h

    Independent study 74 h


Study Material
  • Valid 01.08.2020-31.07.2022:

    Lecture notes. On-line material.

  • Applies in this implementation:

    Lecture slides.

    Sutton&Barto, "Reinforcement learning" (parts).

    LaValle, "Planning Algorithms" (parts).

    All available on-line.

  • Valid 01.08.2020-31.07.2022:

    Required: Basic programming skills, basic calculus (gradient), basic vector and matrix algebra, basic probability (random variables, expectation)
    Recommended: Artificial Intelligence
    Useful: Machine learning - basic principles, Digital and optimal control, Stochastics and estimation



Registration and further information