Topic outline

  • Note: The course is given remotely this autumn 2021. The lectures and exercise sessions are given by zoom and the zoom links will be added here (under sections Lectures and Exercises) before each teaching session begins. Only the exam will require physical presence.

    Overview

    The course gives an overview of the main principles and methods of data mining and how to apply them on real world problems. It introduces the most fundamental pattern types and their search methods, including associative and graph patterns, main approaches to clustering large-dimensional and/or heterogeneous data, web and text mining, social community detection and validation of data mining results.


    Prerequisites

    Good programming skills (CS-A1110 or equivalent), data structures and algorithms (CS-A1140 or equivalent), basic knowledge of probability theory and statistics (MS-A050* or equivalent). Linear algebra is not an official requirement, but some basic knoweldge on matrices is needed.

    Material

    The course is based on textbook Charu C. Aggarwal: Data mining - the textbook. Springer 2015. The e-book is available in Aalto library (login to aalto-primo).  In addition, there will by some external material (linked to the course page). The learning material on each topic will be listed in section Lectures, under each lecture.

    Lectures notes, links to video recordings and other material will be added here in MyCourses.


    Workload

    The average workload (about 135h) consists of 32h contact sessions (lectures and exercises), 45h home assignments, about 25-28h project work, 22h self-studying and 8-10h preparation for the exam. It is suggested that everybody self-studies about 2h after each lecture - then exercise sessions are most rewarding, assignents go more easily and
    there is little work to prepare for the exam. If you skip lectures or exercises, you'll need to self-study more to compensate them.

    Grading

    Course performance consists of three elements:
    • four graded home assignments (about 15 tasks)
    • project work
    • final exam

    The course grade is based on a weighted sum points in all three categories (weights 30%+20%+50%=100%). To pass the course you should get 50% of total points and at least one third of max points in each category.


    Communication

    All important course related announcements are published in MyCourses announcements (visible on this page and by default also emailed to course participants). For wider discussion, questions and advising, we have zulip chat https://mdm2021.zulip.cs.aalto.fi/.

    You are encouraged to ask during lectures and exercise sessions and there are also dedicated zoom sessions for advising. Please, use email only for personal matters that you cannot ask elsewhere.

    More information on practical arrangements in the first lecture!