Topic outline

  • General

    All information on the course organization, learning material, tasks and important announcements will appear here. You can find more details on prerequisites and practical arrangements in the slides of the first lecture.


    Overview


    The course gives an overview of the main principles and methods of data mining and how to apply them on real world problems. It introduces the most fundamental pattern types and their search methods, including associative and graph patterns, main approaches to clustering large-dimensional and/or heterogeneous data, web and text mining, social community detection and validation of data mining results.

    Prerequisites


    Good programming skills (CS-A1110 or equivalent), data structures and algorithms (CS-A1140 or equivalent), basic concepts and techniques of probability and statistics (MS-A050* or equivalent) and linear algebra (MS-A00* or equivalent). Statistical inference (MS-C1620 or equivalent) is recommended. It is suggested to take the prerequisite test (see section Prerequisite test). It will help you to evaluate if you need to recap something
    .


    Material


    The course is based on textbook Charu C. Aggarwal: Data mining - the textbook. Springer 2015. The e-book is available in Aalto library (login to aalto-primo).  In addition, there will by some external material (linked to the course page). The learning material on each topic will be listed in section Lectures, under each lecture.

    Lectures slides, exercise tasks and other material will be added here in MyCourses.


    Workload


    The expected average workload (about 135h) consists of 34-36h contact sessions (lectures and exercises), 20h or 30h solving exercises, 20h homeworks, 40h or 30h self-studying and 20h preparation for the exam. Since there are different studying styles, it is hard to separate self-studying and exercise solving - some people prefer first to study the theory thoroughly and then solve the exercise tasks pretty fast (in average 80min/task), while others do these more or less in parallel. In the latter case, you might spend only 30h self-studying (2.5h after each lecture) and then 30h on exercises (in average 2h/task).
    Similarly, the amount of needed self-studying will be  larger, if you skip lectures or exercises. Whatever learning style you follow, it is recommended to reserve some weekly time for self-studying alone (at least read the given book sections and check you understand everything in slides), since the exercises do not cover everything.


    Grading

    Course performance consists of four elements:

    1. solving individual exercises and active participation in exercise groups (15 tasks in 5 sessions,  max 15p)
    2. submitting homeworks in groups of 2–3 students (5 tasks, max 10p)
    3. final exam Wed 11.12. 13:00–16:00 (max 24p)
    4. prerequisite test (max 1p)
    Sum 50p

    The course grade is based on the sum of the points in all four categories above. To pass the course one should get 50% of total points and 50% of the exam points.


    Communication


    All important course related announcements are published in MyCourses announcements (visible on this page and by default also emailed to course participants). For wider discussion, questions and advising, we have zulip chat https://mdm2024.zulip.aalto.fi. In addition, we have also on campus advising sessions (see section Advising sessions).

    You are encouraged to ask in zulip, during/after the lectures and exercise sessions. Please, use email only for personal matters that you cannot ask elsewhere to avoid email chaos. That way you will also get a response faster.  


    More information on practical arrangements in the first lecture!