Related Video Lectures

Summer 2014

These videos and lecture notes are from a 6-lecture, 12-hour short course on Approximate Dynamic Programming, taught by Professor Dimitri P. Bertsekas at Tsinghua University in Beijing, China in June 2014. They focus primarily on the advanced research-oriented issues of large scale infinite horizon dynamic programming, which corresponds to lectures 11-23 of the MIT 6.231 course.

The complete set of lecture notes are available here: Complete Slides (PDF - 1.6MB), and are also divided by lecture below. Additional supporting material can be obtained on Prof. Bertsekas' web site.

Note To OCW Users: All videos are from Shuvomoy Das Gupta on Youtube and are not provided under our Creative Commons License.

TOPICS VIDEO LECTURES LECTURE NOTES

Introduction to Dynamic Programming (DP)

  • Approximate DP
  • Finite Horizon Problems
  • DP Algorithm for Finite Horizon Problems
  • Infinite Horizon Problems
  • Basic Theory of Discounted Infinite Horizon Problems

Approximate Dynamic Programming, Lecture 1, Part 1 (00:52:25)
Lecture 1 (PDF)
Approximate Dynamic Programming, Lecture 1, Part 2 (00:41:56)
Approximate Dynamic Programming, Lecture 1, Part 3 (00:28:12)

Review of Discounted Problem Theory, Shorthand Notation

  • Algorithms for Discounted DP
  • Value Iteration (VI)
  • Policy Iteration (PI)
  • Q-Factors and Q-Learning
  • DP Models
  • Asynchronous Algorithms
Approximate Dynamic Programming, Lecture 2, Part 1 (00:38:47) Lecture 2 (PDF)
Approximate Dynamic Programming, Lecture 2, Part 2 (00:45:40)
Approximate Dynamic Programming, Lecture 2, Part 3 (00:31:00)

General Issues of Approximation and Simulation for Large-Scale Problems

  • Introduction to Approximate DP
  • Approximation Architectures
  • Simulation-Based Approximate Policy Evaluation
  • General Issues Regarding Approximation and Simulation
Approximate Dynamic Programming, Lecture 3, Part 1 (01:12:44) Lecture 3 (PDF)
Approximate Dynamic Programming, Lecture 3, Part 2 (00:56:00)

Approximate Policy Iteration based on Temporal Differences, Projected Equations, Galerkin Approximation

  • Approximation in Value Space
  • Approximate VI and PI
  • Projected Bellman Equations
  • Matrix Form of the Projected Equation
  • Simulation-Based Implementation
  • LSTD and LSPE Methods
  • Bias-Variance Tradeoff
Approximate Dynamic Programming, Lecture 4, Part 1 (00:38:13) Lecture 4 (PDF)
Approximate Dynamic Programming, Lecture 4, Part 2 (00:45:30)

Aggregation Methods

  • Review of Approximate PI Based on Projected Bellman Equations
  • Issues of Policy Improvement
  • Exploration Enhancement in Policy Evaluation
  • Oscillations in Approximate PI
  • Aggregation: Examples, Simulation-Based, Relation with Projected Equations
Approximate Dynamic Programming, Lecture 5, Part 1 (00:38:25) Lecture 5 (PDF)
Approximate Dynamic Programming, Lecture 5, Part 2 (00:36:27)
Approximate Dynamic Programming, Lecture 5, Part 3 (00:40:45)

Q-Learning, Approximation in Policy Space

  • Review of Q-Factors and Bellman Equations for Q-Factors
  • VI and PI for Q-Factors
  • Q-Learning: Combination of VI and Sampling
  • Q-Learning and Cost Function Approximation
  • Adaptive Dynamic Programming
  • Approximation in Policy Space
  • Additional Topics
Approximate Dynamic Programming, Lecture 6, Part 1 (00:47:43) Lecture 6 (PDF)
Approximate Dynamic Programming, Lecture 6, Part 2 (00:45:18)

Summer 2012

These notes are from a condensed, more research-oriented version of the course, given by Prof. Bertsekas in Summer 2012.

Short Course Notes (PDF)