MS-E1622 - Algebraic Methods in Data Science, Lecture, 13.1.2023-14.4.2023
This course space end date is set to 14.04.2023 Search Courses: MS-E1622
Topic outline
-
Schedule
Lectures: Fridays 10:15-12:00 in M234 (M3) by Kaie Kubjas
Exercise sessions: Fridays 14:15-16:00 in Y324a by Muhammad Ardiyansyah
Content
The planned topics covered in the lectures are below. There might be some changes to it.
Week 1: Euclidean distance geometry and semidefinite programming
Week 2: 3D genome reconstruction and numerical algebraic geometry
Week 3: Nonnegative matrix factorizations
Week 4: Nonnegative matrix factorizations continued
Week 5: Tensors and tensor decompositions
Week 6: Topological data analysis and persistent homology
Week 7 (exams week): No classes
Week 8: No lecture, work on projects, project description due
Week 9: Algebraic methods in optimization
Week 10: Conditional independence and primary decompositions
Week 11: Undirected graphical models
Week 12: No lecture, work on projects
Week 13: Presentations of projects, project reports due
Week 14 (exams week): No classes
Organization
Lectures and exercise sessions take place in person. Slides or lecture notes for each lecture will be posted in MyCourses latest after the lecture.
Homework
There will be up to 7 homework assignments. Homework assignments contain exercises to be solved by hand or computer algebra software. All homework is returned through MyCourses as one file in the pdf format. Code can be submitted as a separate file.
Project
The project can mean reading a paper, working on a small research project or applying algebraic methods on a data set, and writing a report on it.
The deadline to submit your project title and the names of group members is March 3 at 20:00. The deadline to submit a 1-page project introduction/description is March 10 at 20:00. The project introduction/description can be used as the first page of the final write-up. Please submit both the title (as text) and description (a file) under Homework in MyCourses. There are submission boxes for both of them.
Project reports are due by the end of week 13 and short presentations take place during week 13
The groups can have 1-3 members. The minimal length of the write-up of a 1-2 person group is 4 pages and of a 3-person group is 6 pages.
Alternative project
Alternative to the project described above, one can choose to participate in the Eric and Wendy Schmidt Center's cancer immunotherapy data science challenge in teams of size at most 5: https://go.topcoder.com/schmidtcentercancerchallenge/
This challenge takes place in January and February. If you are interested in this option, please send an email to the course instructor to coordinate forming a team.
Grade
This course is graded pass/fail. For passing the course, one has to receive at least 60% of maximal possible points on homework sets and successfully complete the project.
Lecture materials
There is a no standard book for this course and the material will be combined from different sources. Slides or lectures notes will be made available latest after lectures. Some sources used will be:
- Blekherman, Grigoriy, Pablo A. Parrilo, and Rekha R. Thomas, eds. Semidefinite optimization and convex algebraic geometry. Society for Industrial and Applied Mathematics, 2012.
- Gillis, Nicolas. Nonnegative matrix factorization. Society for Industrial and Applied Mathematics, 2020.
- Kolda, Tamara G., and Brett W. Bader. "Tensor decompositions and applications." SIAM review 51.3 (2009): 455-500.
- Otter, Nina, et al. "A roadmap for the computation of persistent homology." EPJ Data Science 6 (2017): 1-38.
- Sullivant, Seth. Algebraic statistics. Vol. 194. American Mathematical Soc., 2018.
- Blekherman, Grigoriy, Pablo A. Parrilo, and Rekha R. Thomas, eds. Semidefinite optimization and convex algebraic geometry. Society for Industrial and Applied Mathematics, 2012.