Please note! Course description is confirmed for two academic years, which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.

LEARNING OUTCOMES

After the course, the students have an overview of the main principles and methods of data mining and know how to apply them on real world problems. They know the most fundamental pattern types and their search methods, including associative, graph and sequence mining, main approaches to cluster large-dimensional and heterogenous data, and how to validate the data mining results.

Credits: 5

Schedule: 07.09.2020 - 24.12.2020

Teacher in charge (valid 01.08.2020-31.07.2022): Wilhelmiina Hämäläinen

Teacher in charge (applies in this implementation): Wilhelmiina Hämäläinen

Contact information for the course (valid 28.08.2020-21.12.2112):

  • Course forum (for any questions on course contents, exercises, assignments, etc); address will be announced later
  • email to the lecturer: Wilhelmiina Hämäläinen wilhelmiina.hamalainen@aalto.fi

teaching assistants:

Martino Ciaperoni martino.ciaperoni@aalto.fi
Zhang Jun jun.1.zhang@aalto.fi
Oleg Vlasovetc oleg.vlasovetc@aalto.fi

  • by zoom during exercises and after lectures


CEFR level (applies in this implementation):

Language of instruction and studies (valid 01.08.2020-31.07.2022):

Teaching language: English

Languages of study attainment: English

CONTENT, ASSESSMENT AND WORKLOAD

Content
  • Valid 01.08.2020-31.07.2022:

    The course covers fundamental data mining problems, such as pattern discovery, graph mining, and clustering different types of data. The main emphasis is in learning the basic principles of data mining and their application in practice, including method selection, validation, and scalablity issues.

  • Applies in this implementation:

    Syllabus

    • Introduction to Data mining
    • Data preprocessing
    • Distance and similarity
    • Clustering (hierarchical, spectral, graph-based, ... + evaluation)
    • Association mining
    • Graph mining
    • Web mining and recommendation systems
    • Social network analysis
    • Text mining
    • Outlier detection


Assessment Methods and Criteria
  • Valid 01.08.2020-31.07.2022:

    Home assignments, project work, examination.

  • Applies in this implementation:

    Three types of assignments:

    • 3 graded homeworks (period 1)
    • project work (period 2)
    • final exam (period 2)


    The course grade is based on a sum points in all categories
    (30%+35%+35%=100%). To pass the course one needs to get at least 25% of the max grade in each category. In the grading assignments, we will use "I-don't-know policy", which means that "I don't know'' answers receive 15% of the grade.



Workload
  • Valid 01.08.2020-31.07.2022:

    Contact teaching 24h lectures + 12h exercises; self studying 90-100h (home assignments, project work, exam preparation).

DETAILS

Study Material
  • Valid 01.08.2020-31.07.2022:

    Lecture slides and external material. The course book will be announced later.

  • Applies in this implementation:

    Charu C. Aggarwal: Data Mining: The Textbook, Springer 2015.
    E-book available in Aalto library: https://aalto.finna.fi/Record/alli.773205

Substitutes for Courses
  • Valid 01.08.2020-31.07.2022:

    Substitutes CS-E4600 Algorithmic Methods of Data Mining

Prerequisites
  • Valid 01.08.2020-31.07.2022:

    Programming skills (CS-A1110 or equivalent), data structures and algorithms (CS-A1140 or equivalent), basic concepts of probability and statistics (MS-A050* or equivalent).