In an unknown environment, when making a decision, a learning agent can only rely on a limited number of observations (or evidence) on the possible choices. At each step, the learning agent needs to decide whether to gather more information on the environment (explore), or to make the best decision given the current information (exploit). This exploration-exploitation trade-off is common to all situations where decisions need to be made under uncertainty and it is a dynamic research topic. Current applications of interest include clinical trials for deciding on the best treatment to give to a patient, on-line advertisements and recommender systems, or game playing.
Sequential, adaptive procedures allow the decision process to be more flexible and thus, the learning agent can make a more efficient use of resources and collect the observations needed to make more informed decisions. This course will present the current machine learning tools and formulations used to handle this problem. We will study the existing approaches and interesting applications for sequential decision making problems, through introductory lectures and discussion of state of the art papers. We will also present the current related research going on in the Probabilistic Machine Learning group, where these techniques are integrated and developed for research projects on personalized medicine and user interaction.
This is a seminar type course, where after an introduction to the topic, we will discuss published research articles on bandit algorithms. Each student will present one or two papers in a list of proposed articles, from fundamental, theoretical articles, to application papers, and papers extending to other sequential decision making frameworks. Everyone is expected to have read the discussed paper prior to the seminar. Following each presentation there will be a discussion with all course participants regarding the contributions of the paper and the questions remaining open. Active participation is strongly encouraged.
The course is mainly aimed at doctoral students and advanced master's students. Note that due to the format of the course, the number of students is limited to maximum 20. Familiarity to Machine Learning basic principles is a plus.
Time and place:
Period II, 31.10.2016-05.12.2016.
Lectures in R030/T3 C206, Mondays, 10:15-12:00.
Extent of the course: 3 ECTS (for presenting papers and active participation), possibility of obtaining 5 ECTS if doing one of the proposed projects.
Grades: Pass or Fail.
Instructors: Marta Soare, Tomi Peltola
Advisor: Prof. Samuel Kaski
Log in to see the course plan and materials.