ECE 598MR: Statistical Learning Theory (Fall 2015)

About | Schedule | References | Coursework

Schedule

The schedule will be updated and revised as the course progresses. Each topic will come with links to reference materials; key references will be highlighted. To get a rough idea of the material, check out the schedules from past offerings: Fall 13, Fall 14.

Tue, Aug 25 [notes]

Introduction, history, overview, and administrivia.

Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi, Introduction to statistical learning theory, in Advanced Lectures in Machine Learning (O. Bousquet, U. von Luxburg, and G. Rätsch, editors), pp. 208-204, Springer, 2004
Theodoros Evgeniou, Massimiliano Pontil, and Tomaso Poggio, Statistical learning theory: a primer, International Journal of Computer Vision, vol. 38, no. 1, pp. 9-13, 2000
Ulrike von Luxburg and Bernhard Schölkopf, Statistical learning theory: models, concepts, and results (http://arxiv.org/abs/0810.4752), 2008
Tomaso Poggio and Steve Smale, The mathematics of learning: dealing with data, Notices of the American Mathematical Society, vol. 50, no. 5, pp. 537-544, 2003
Cosma Shalizi, Learning theory (formal, computational or statistical) (http://cscs.umich.edu/~crshalizi/notebooks/learning-theory.html), Jan 09, 2011 [A nice succinct summary, with lots of useful references]

Thu, Aug 27 Tue, Sep 1 [notes]

Concentration inequalities: Markov, Chebyshev, McDiarmid (bounded differences inequality), examples

Torben Hagerup and Christine Rüb, A guided tour of Chernoff bounds, Information Processing Letters, vol. 33, no. 6, pp. 305-308, 1990 [Short and sweet]
Gábor Lugosi, Concentration-of-measure inequalities, lecture notes, 2003-2009
Colin McDiarmid, Concentration, Probabilistic Methods for Algorithmic Discrete Mathematics, pp. 1-46, 1998
Terence Tao, Concentration of measure (http://terrytao.wordpress.com/2010/01/03/254a-notes-1-concentration-of-measure/), Jan 03, 2010

Thu, Sep 3 [notes]

Formulation of the learning problem: concept and function learning; realizable case; Probably Approximately Correct (PAC) learning.

Dana Angluin, Queries and concept learning, Machine Learning, vol. 2, no. 4, pp. 319-342, 1988
David Haussler, PAC learning model, and decision-theoretic generalizations, with applications to neural nets, in Mathematical Perspectives on Neural Networks, Lawrence Erlbaum Associates, 1995
Leslie Valiant, A theory of the learnable, Communications of the ACM, vol. 27, no. 11, pp. 1134-1142

Tue, Sep 8 [notes]

Formulation of the learning problem, continued: agnostic (model-free) learning; consistency; Empirical Risk Minimization

Dana Angluin, Queries and concept learning, Machine Learning, vol. 2, no. 4, pp. 319-342, 1988
David Haussler, PAC learning model, and decision-theoretic generalizations, with applications to neural nets, in Mathematical Perspectives on Neural Networks, Lawrence Erlbaum Associates, 1995
Leslie Valiant, A theory of the learnable, Communications of the ACM, vol. 27, no. 11, pp. 1134-1142

Thu, Sep 10 Tue, Sep 15 [notes]

Empirical Risk Minimization: abstract risk bounds and Rademacher averages -- stochastic inequalities for ERM; Rademacher averages (structural results, Finite Class Lemma); introduction to VC classes

Peter Bartlett and Shahar Mendelson, Rademacher and Gaussian complexities: risk bounds and structural results, Journal of Machine Learning Research, vol. 3, pp. 463-482, 2002
Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi, Theory of classification: a survey of recent advances, ESAIM Probability and Statistics, vol. 9, pp. 323-375, 2005 (Section 3 only)

Thu, Sep 17 [notes]

Vapnik-Chervonenkis classes: shatter coefficients; VC dimension; examples of VC classes; Sauer-Shelah lemma; implication for Rademacher averages

Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred Warmuth, Learnability and the Vapnik-Chervonenkis dimension, Journal of the ACM, vol. 36, no. 4, pp. 929-965, 1989
Gábor Lugosi, Pattern classification and learning theory, in Principles of Nonparametric Learning (L. Györfi, editor), pp. 1-56, Springer, 2002 (parts of Section 1.4)

Tue, Sep 22 Thu, Sep 24 [notes]

Binary classification: bounds for simple VC classes (linear and generalized linear discriminant rules); surrogate loss functions; margin-based bounds

Peter Bartlett, Michael Jordan, and Jon McAuliffe, Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol. 101, no. 473, pp. 138-156, 2006
Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi, Theory of classification: a survey of recent advances, ESAIM Probability and Statistics, vol. 9, pp. 323-375, 2005

Tue, Sep 29 Thu, Oct 1
: No class: Allerton conference

Tue, Oct 6 Thu, Oct 8 [notes]

Binary classification, continued: reproducing kernel Hilbert spaces and kernel machines; convex risk minimization

Peter Bartlett, Michael Jordan, and Jon McAuliffe, Convexity, classification, and risk bounds, Journal of the American Statistical Association, vol. 101, no. 473, pp. 138-156, 2006
Olivier Bousquet, Stéphane Boucheron, and Gábor Lugosi, Theory of classification: a survey of recent advances, ESAIM Probability and Statistics, vol. 9, pp. 323-375, 2005

Tue, Oct 13 Thu, Oct 15 [notes]

Regression with quadratic loss

Presentation loosely based on Chapter 8 of Cucker and Zhou.

Tue, Oct 20 Thu, Oct 22 Thu, Oct 29 Tue, Nov 3 Thu, Nov 5 [notes]

Stability of learning algorithms: learnability without uniform convergence; average and uniform stability of learning algorithms; the role of convexity and strong convexity; stability of Stochastic Gradient Descent; connection between differential privacy, stability, and generalization

Olivier Bousquet and André Elisseeff, Stability and generalization, Journal of Machine Learning Research, vol. 2, pp. 499-526, 2002
Alexander Rakhlin, Sayan Mukherjee, and Tommaso Poggio, Stability results in learning theory, Analysis and Applications, vol. 3, no. 4, pp. 397–417, 2005
Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan, Learnability, stability, and uniform convergence, Journal of Machine Learning Research, vol. 11, pp. 2635-2670, 2010
Moritz Hardt, Ben Recht, and Yoram Singer, Train faster, generalize better: stability of stochastic gradient descent, preprint, 2015
Kobbi Nissim and Uri Stemmer, On the generalization properties of differential privacy, preprint, 2015

Tue, Nov 10 Thu, Nov 12

Online learning: basic model; regret; regret bounds for online convex and strongly convex programming via projected gradient descent; online-to-batch conversions; relation to Rademacher averages.

Martin Zinkevich, Online convex programming and generalized infinitesimal gradient ascent, ICML 2003
Elad Hazan, Amit Agarwal, and Satyen Kale, Logarithmic regret algorithms for online convex optimization, Machine Learning, vol. 69, no. 2-3, pp. 169-192, 2007
Nicolò Cesa-Bianchi, Alex Conconi, and Claudio Gentile, On the generalization ability of online learning algorithms, IEEE Transactions on Information Theory, vol. 50, no. 9, pp. 2050-2057, 2004
Jacob Abernethy, Alekh Agarwal, Peter Bartlett, and Alexander Rakhlin, A stochastic view of optimal regret through minimax duality, COLT 2009

Tue, Nov 17 Thu, Nov 19 Tue, Dec 1 [notes]

Minimax lower bounds: binary classification under a margin assumption; reduction to finite testing on a binary hypercube (Assouad's lemma); extra log factor for rich VC classes; information-theoretic methods (Fano's inequality)

Pascal Massart and Élodie Nédélec,, Risk bounds for statistical learning, Annals of Statistics, vol. 34, no. 5, pp. 2326-2366, 2006.
Bin Yu, Assouad, Fano, and Le Cam, in Festschrift for Lucien Le Cam, edited by D. Pollard, E. Torgersen, and G. Yang, pp. 423-435, 1997, Springer-Verlag.