Syllabus

Course Meeting Times

Lectures: 3 sessions / week, 1 hour / session

Prerequisites

Permission of instructor is required. Helpful courses (ideal but not required): Theory of Probability (18.175) and either Statistical Learning Theory and Applications (9.520) or Machine Learning (6.867)

Description

The main goal of this course is to study the generalization ability of a number of popular machine learning algorithms such as boosting, support vector machines and neural networks. We will develop a number of technical tools that will allow us to give qualitative explanations of why these learning algorithms work so well in many classification problems.

Topics of the course include Vapnik-Chervonenkis theory, concentration inequalities in product spaces, and other elements of empirical process theory.

Grading

The grade is based upon two problem sets and class attendance.

Course Outline

Introduction

  • Classification problem set-up
  • Examples of learning algorithms: Voting algorithms (boosting), support vector machines, neural networks
  • Analyzing generalization ability

Technical Tools: Elements of Empirical Process Theory

One-dimensional Concentration Inequalities

  • Chebyshev (Markov), Rademacher, Hoeffding, Bernstein, Bennett
  • Toward uniform bounds: Union bound, clustering

Vapnik-Chervonenkis Theory and More

  • VC classes of sets and functions
  • Shattering numbers, growth function, covering numbers
  • Examples of VC classes, properties
  • Uniform deviation bounds
  • Symmetrization
  • Kolmogorov's chaining technique
  • Dudley's entropy integral
  • Contraction principles

Concentration Inequalities

  • Talagrand's concentration inequality on the cube
  • Symmetrization
  • Talagrand's concentration inequality for empirical processes
  • Vapnik-Chervonenkis type inequalities
  • Martingale-difference inequalities

Applications

  • Generalization ability of voting classifiers, neural networks, support vector machines