The course covers general topics in data mining, such as pattern discovery, similarity search, data clustering, graph mining, ranking and ordering problems, stream computation, and distributed analysis of data, such as map-reduce.
Some of course topics will be covered from the textbook:
Leskovec, Rajaraman, and Ullman: Mining of massive datasets,
available by Cambridge University Press and online:
Additional reading material will be posted in the course webpage.
The syllabus will cover:
- Introduction to data mining.
- Distance functions and embeddings.
- Similarity search, locality-sensitive hashing, and dimensionality reduction.
- Pattern mining and frequent itemset mining.
- Analysis of sequential data.
- Link analysis and methods for ordering data.
- Approximation algorithms for clustering problems, such as, k-means and k-median.
- Graph partitioning, spectral graph analysis, and spectral data analysis.
- Data streams.
The course syllabus will be similar to the one of the course given last year, under the code T-61.5060, although some material will be updated and revised.
Course meetings are Mon, Tue, 4-6pm, at T1. Exercise sessions are scheduled for Thu, 2-4pm, at T2.
Course instructor: Aristides Gionis, email@example.com
Teaching assistants: Han Xiao, firstname.lastname@example.org, and Orestis Kostakis, email@example.com
Office hours, by appointment.