This course will provide an introduction to some of the central topics in computational learning theory, a field that seeks to answer the question “can machines learn?” from the perspective of theoretical computer science. We will study mathematical and computational models of learning that give rigorous analysis on learning algorithms with focus on error bounds and computational efficiency.
- Basic discrete math and probability.
- Big-O notation and basic analysis of algorithms.
- Familiarity with mathematical proof principles.
- Basic knowledge of the theory of NP-completeness.
- Michael Kearns and Umesh Vazirani, An Introduction to Computational Learning Theory, The MIT Press, 1994. Available as an e-book via the Aalto online library via Aalto credentials
- Shai Shalev-Shwartz and Shai Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014. Available online: http://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf
- Freund and Schapire, Boosting: foundations and algorithms, MIT press, 2012. Available as an e-book via the Aalto online library via Aalto credentials
- Lecture 1: Basic notions and introduction to PAC learning model
- Lecture 2: Uniform and non-uniform learnability
- Lecture 3: Vapnik-Chervonenkis dimension and the sample complexity of learning
- Lecture 4: Rademacher complexity and covering numbers
- Lecture 5: Weak and strong learning
- Lecture 6: Learning in the presence of noise: statistical query learning model
- Lecture 7: Submodular optimization
- Lecture 8: Applications of submodularity in machine learning
- Lecture 9: Online learning: mistake-bound models
- Lecture 10: Online learning: no-regret models
- return 3 problem sets (homeworks)
- read a research paper from a recommended list and write a paper summary
- Cigdem Aslay
- Aristides Gionis
Other reading material
- Valiant, L. G. (1984), “A theory of the learnable”. ACM symposium on Theory of computing.
- Vapnik, V. N., & Chervonenkis, A. Y. (1971). "On the uniform convergence of relative frequencies of events to their probabilities". Theory of Probability and its Applications.
- Blumer et al., (1989). "Learnability and the Vapnik-Chervonenkis dimension". Journal of the ACM.
- Haussler, D. (1992). "Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications". Information and Computation.
- Blumer et al., (1987). "Occam's razor". Information processing letters.
- Rivest, R. (1987). "Learning decision lists". Machine learning, 2(3), 229-246.
- Ehrenfeucht, A., & Haussler, D. (1989). "Learning decision trees from random examples." Information and Computation, 82(3), 231-246.
- Littlestone, N. (1987). “Learning Quickly When Irrelevant Attributes Abound - A New Linear-threshold Algorithm”. Machine Learning.
- Schapire, R. E. (2003). “The boosting approach to machine learning: An overview”. Nonlinear estimation and classification.
- Freund, Y., & Schapire, R. E. (1997). "A decision-theoretic generalization of on-line learning and an application to boosting". Journal of computer and system sciences
- Koltchinskii, V. & Panchenko, D. (2000), “Rademacher processes and bounding the risk of function learning.” Springer High Dimensional Probability II.
- "Probably approximately correct: Nature's algorithms for learning and prospering in a complex world", by Leslie Valiant, Publisher: Basic Books.