Topic outline

  • Overview

    The course has a final project to apply the knowledge gathered throughout the course to a specific problem.

    Course Project

    In the project work, we will implement and apply some more advanced RL algorithms in continuous control tasks. The project work includes two parts. First, two vastly used reinforcement learning algorithms, TD3 (https://arxiv.org/pdf/1802.09477.pdf) and PPO (https://arxiv.org/abs/1707.06347), will be implemented. For this part, we will offer the base code so you can start easily. After finishing this part, you can train a policy to balance a cart pole and to control an halfcheetah running forward. 

    In the second part, you need to read some research papers and implement their proposed algorithms based on the code finished in Part I. The candidate algorithms in Part II include
    According to your preference, you can choose one of them to understand the paper and to implement the algorithm. For the listed algorithms, we will offer you the reference training curve. If you are interested in other algorithms, you can also choose them in Part II, but we can not offer much help in implementing those algorithms. 

    This project work is supposed to be done in groups of 2 students. If you need to find a partner for the project, please join the project channel on Slack and advertise yourself. The deadline for both the course project and the alternative project is 05.12.2021 at 23:55. (edited) 

    Alternative Project

    Alternatively, students can also propose their own project topic. This option is mainly aimed at PhD students that want to apply Reinforcement Learning to their own field, but Master's students are also encouraged. 

    Alternative project topics are individualThe project proposal needs to be submitted and will be evaluated by the staff of the course, and can be started once the project is approved. 

    The deadline for the alternative course project proposal is 29.10.2021 at 23:55.

    Grading

    The course project grade accounts for 30% of the final grade of the course.