Please note! Course description is confirmed for two academic years, which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.

LEARNING OUTCOMES

After the course, the students have an overview of the main principles and methods of data mining and know how to apply them on real world problems. They know the most fundamental pattern types and their search methods, including associative and graph mining, main approaches to cluster large-dimensional and heterogenous data, basic concepts and techniques in social network analysis, web and text mining, and how to validate the data mining results.

Credits: 5

Schedule: 02.09.2024 - 11.12.2024

Teacher in charge (valid for whole curriculum period):

Teacher in charge (applies in this implementation): Wilhelmiina Hämäläinen

Contact information for the course (applies in this implementation):

CEFR level (valid for whole curriculum period):

Language of instruction and studies (applies in this implementation):

Teaching language: English. Languages of study attainment: English

CONTENT, ASSESSMENT AND WORKLOAD

Content
  • valid for whole curriculum period:

    The course covers fundamental data mining problems, such as pattern discovery, graph mining, and clustering different types of data. Text mining, social network analysis, and special topics will be covered more briefly. The main emphasis is in learning the basic principles of data mining and their application in practice, including method selection, algorithm strategies, validation, and scalability issues.

Assessment Methods and Criteria
  • valid for whole curriculum period:

    Home assignments, exercises, examination.

Workload
  • valid for whole curriculum period:

    Study methods consist of lectures (24h), exercise sessions (10h), home assignments (about 40h), self-studying (40h), and preparation for the exam (20h). Lectures and exercise sessions are voluntary and can be replaced by self studying.

  • applies in this implementation

    It is hard to separate work required by exercises and self-studying, since many do them in parallel. In the original estimates, it was assumed that you would first study the theory thoroughly and then need to spend relatively little time on exercises (in average 80min/task). However, you might spend only 30h self-studying (like 2.5h after each lecture) and then 30h solving exercises (in average 2h/task).

DETAILS

Study Material
  • valid for whole curriculum period:

    Course book Charu C. Aggarwal: Data mining - the textbook. Springer 2015. Lecture slides and external materal.

Substitutes for Courses
Prerequisites

FURTHER INFORMATION

Further Information
  • valid for whole curriculum period:

    Teaching Language: English

    Teaching Period: 2024-2025 Autumn I - II
    2025-2026 Autumn I - II

    Registration:

    Participation is subject to a maximum quota of 150 students. Enrollments will be manually approved, based on prerequisites the students provide, and the following prioritization:

    1. Students studying in Computer, Communication and Information Sciences and majoring in Machine Learning, Data Science and Artificial Intelligence; or majoring in Computer Science, track Big Data and Large Scale Computing; or students studying in ICT Innovation programme and majoring in Data Science.
    2. Students studying in Computer, Communication and Information Sciences and majoring in Computer Science or Security and Cloud Computing; students studying Master’s Programme in Security and Cloud Computing, Erasmus Mundus; or students studying minor in Machine Learning, Data Science and Artificial Intelligence.
    3. Students studying in Aalto Bachelor's programme in Science and Technology majoring in Data Science.
    4. Students studying Bachelor's Programme in Science and Technology and majoring in Computer Science.
    5. Other students studying in Computer, Communication and Information Sciences; or students studying in ICT Innovation programme and majoring in Autonomous Systems and Intelligent Robots or Cloud and Network Infrastructures; or students studying in Master’s Programme in Communications and Data Science and majoring in Communications Engineering and Data Science.
    6. Other students.

  • applies in this implementation

    Note that the deadline to register for the course is 1st September 2024.