ELEC-E5431 - Large scale data analysis D, Lecture, 9.1.2024-10.4.2024
This course space end date is set to 10.04.2024 Search Courses: ELEC-E5431
Topic outline
-
Basic information
- Credit: 5 ECTS
- Level: M.Sc. and Doctoral Studies
- Teaching: Period III-IV (12 x 2h lectures), 9.1.2024 – 10.4.2024
- Exercise sessions: Periods III-IV (6 x 2h)
- Language: English
Course personnel
- Profs. Sergiy Vorobyov, Esa Ollila and Visa Koivunen
- Teaching assistants (TA): Eeli Susan and Xinjue Wang
- Guest Lecturer: TBD
You may contact us by email via format firstname.lastname@aalto.fi
Tue 10-12
Wed 12-14
9.1: Sergiy, Kide big hall Sklodowska-Curie - 1501
16.1: Eeli, Maarintie 8, AS6. Recap on vector calculus and Basic Matrices (using numpy)
17.1: Esa, Kide smaller classroom Meitner - 1571 Recap on matrix algebra
23.1: Sergiy, Kide big hall Sklodowska-Curie - 1501
30.1: Sergiy, Kide big hall Sklodowska-Curie - 1501
31.1: Kide smaller classroom Meitner - 1571
6.2:Sergiy, Maarintie 8, AS6
13.2: Sergiy, Kide big hall Sklodowska-Curie - 1501
14.2: Kide smaller classroom Meitner - 1571
Exam week
27.2: Esa, Kide big hall Sklodowska-Curie - 1501
5.3: Esa, Kide big hall Sklodowska-Curie - 1501
6.3: Kide smaller classroom Meitner - 1571
12.3: Esa, Kide big hall Sklodowska-Curie - 1501
19.3: Esa, Kide big hall Sklodowska-Curie - 1501
20.3: Kide smaller classroom Meitner - 1571
26.3: Visa, Kide big hall Sklodowska-Curie - 1501
9.4: Sergiy, Kide big hall Sklodowska-Curie - 1501
10.4: Kide smaller classroom Meitner - 1571
NOTE: Some exceptions to the lecture hall:
- Lectures on Tue. at 10:15-12:00 (see the table above for the lecturer and location for each lecture)
- Exercises on Wed. at 12:15-14:00 (see the table above for the location for each exercise)
Objectives:
- to give students the tools and training to recognize the problems of processing large scale data that arise in engineering and computer science
- to present the basic theory of such problems, concentrating on results that are useful in computation
- to give students a thorough understanding of how such problems are thought of, modeled and addressed, and some experience in solving them
- to give students the background required to use the methods in their own research work
- to give students sufficient background information in linear algebra, statistical and machine learning tools, optimization, and sparse reconstruction for processing large scale data
- to give students a number of examples of successful application of the techniques for signal processing of large scale data.
Materials and text books
- Course slides, lecture and exercise session notes, videos and codes
There are several useful textbooks (with online pdf-s) available such as:
- Jorge Nocedal and Stephan J. Wright NumericalOptimization.pdf (uci.edu) (mainly Chapters 2, 3, 5, and 7)
- Yurii Nesterov, Lectures on Convex Optimization.pdf (shuyuej.com) (mainly Chapters 1, 2, and 3)
- Friedman, J., Hastie, T. and Tibshirani, R., 2009. The elements of statistical learning: data mining inference and prediction. Spring, New York, ´USA. [ESL] https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf
- Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: the lasso and generalizations. CRC press. https://web.stanford.edu/~hastie/StatLearnSparsity_files/SLS.pdf
Assessment and grading
- All together 3-4 Homework assignments. Homework assignments are computer programming assignments which can be done preferably either using python or MATLAB.
Homeworks will be made available on Mycources and returned via Mycources (follow instructions for each homework).
Grading will be based on homework assignments.
Possible bonus points for active participation for lectures and exercise sessions.
Contents (tentative, subject to change):
Sergiy:
- Optimization Methods for Large Scale Data Analysis (First-Order Accelerated Methods for Smooth Functions, Extension to Non-Smooth Functions - Sub-gradient Methods)
- Optimization Methods for Huge Scale Data Analysis (Stochastic Gradient Methods, Proximal Gradient Method, Mirror Descent, Frank-Wolfe, ADMM, Block-Coordinate Descent)
- Applications to Data Analysis with Structured Sparsity and Machine Learning
Esa:
- Classification and regression tasks, basic principles
- Lasso and its generalisations
- Decision Trees, Bagging, Random Forests.
- Boosting and its variants
Visa : TBD
-
Final Grades File PDF
-
Introduction and Overview File PDF
Introduction
Motivation
History
Encompassing Model
Basic Data Analysis Problems
Example: PCA
-
Sergiy's Summary Notes (Pony) File PDF
Everything in a short summary notes (pony) form for computational (optimization) aspects of large-scale data analysis.
-
Gradient Descent File PDF
Quadratic minimization problems
Strongly convex and smooth problems
Convex and smooth problems
Nonconvex problems
-
Subgradient Methods File PDF
Steepest descent
Subgradients
Projected subgradient descent:
(i) Convex and Lipschitz problems
(ii) Strongly convex and Lipschitz problems
Convex-concave saddle point problems
-
Mirror Descent File PDF
Mirror descent
Bregman divergence
Alternative forms of mirror descent
Convergence analysis
-
Proximal Gradient Descent File PDF
Proximal gradient descent for composite functions
Proximal mapping / operator
Convergence analysis
-
Accelerated Gradient Methods File PDF
Heavy-ball methods
Nesterov’s accelerated gradient methods
Accelerated proximal gradient methods (FISTA)
Convergence analysis
Lower bounds
-
Dual and Primal-Dual Proximal Gradient Methods File PDF
Dual proximal gradient method
Primal-dual proximal gradient method
-
ADMM File PDF
Augmented Lagrangian method
Alternating direction method of multipliers
-
Stochastic Gradient Descent (SGD) File PDF
Stochastic gradient descent (stochastic approximation)
Convergence analysis
Reducing variance via iterate averaging
-
Variance Reduction for SGD File PDF
Stochastic variance reduced gradient (SVRG): Convergence analysis for strongly convex problems
Stochastic recursive gradient algorithm (SARAH): Convergence analysis for nonconvex problems
Other variance reduced stochastic methods:
(i) Stochastic dual coordinate ascent (SDCA)
(ii) SAGA
-
Adam File PDF
Introduction to the use of optimization for machine learning: from GD to SGD to Momentum to Adam.
-
Homework Assignment 1
Due to February 15, 2024.
-
Homework Assignment 2
-
Lecture on multiple hypothesis testing and sequential inference File PDF
Introduction to Multiple Hypothesis Testing (MHT), Sequential Detection and Change-Point Detection for large scale and streaming data by Visa Koivunen DICE/ELEC