ECE 555: Lecture Schedule

The schedule will be updated and revised as the course progresses. Links to scribe lectures are on the left.

Markov Decision Processes in discrete time

Tue Jan 15

Introduction and administrivia
Control as optimization over time
Stochastic optimization, static case:

complete and partial observations
Blackwell's principle of irrelevant information

Thu Jan 17 [week 1 notes]

Controlled Markov processes
Policies and their expected cost
Optimality of Markov policies

Tue Jan 22

Dynamic programming

comparison principle
Bellman's principle of optimality

Example: equipment replacement problem

Thu Jan 24 [week 2 notes]

Monotonicity of value functions

stochastic dominance
stochastically monotone MDPs
sufficient condition for monotonicity of value functions

Tue Jan 29

Markov decision processes with general state and action spaces

review: Kolmogorov axioms of probability
measurability, random variables, Borel sigma-fields, Borel spaces
MDPs with Borel state and action spaces
optimality of Markov policies in the general case

Thu Jan 31 [week 3 notes]

The finite-horizon Linear Quadratic Regulator (LQR) problem

dynamic programming recursion
completing-the-squares lemma
structural properties: policies are linear, value functions are quadratic and convex

Tue Feb 5

MDPs with partial observations: the finite case

problem formulation and goals
prelude: hidden Markov models
nonlinear filter recursion and Markov property

Thu Feb 7 [week 4 notes]

MDPs with partial observations, continued

definition of admissible policies and expected cost
state space augmentation: belief state
belief state process as a controlled Markov process

Tue Feb 12

MDPs with partial observations, continued

reduction to the fully observed case via belief state
dynamic programming in belief space
example: two-armed bandit

Thu Feb 14 [week 5 notes]

The two-armed bandit problem

state-space formulation
derivation of the belief state evolution
optimality of the index policy

Tue Feb 19

Stochastic control over an infinite horizon

infinite-horizon optimality criteria: discounted cost, average cost
discounted cost

Discounted-Cost Optimality Equation
contraction property of the dynamic programming operator
existence and uniqueness of the value function, value iteration

Thu Feb 21 [week 6 notes]

Discounted cost, continued

policy iteration
structure of discount-optimal policies

Tue Feb 26

Average-cost optimality criterion

canonical triplets
Average-Cost Optimality Equation
relative value function as terminal cost for a finite-horizon problem

Thu Feb 28 Tue Mar 5

Linear Quadratic Regulator under the average-cost criterion

quadratic ansatz for the relative value function, discrete algebraic Riccati equation
conditions for existence and uniqueness of the fixed point of the DARE
existence of stabilizing optimal controller

Thu Mar 7 [week 8 notes]

Linear-Quadratic-Gaussian (LQG) problem with partial observations

problem formulation
LQ HMM: derivation of the Kalman filter
controllability and observability conditions for existence and uniqueness of the steady-state Kalman filter

Tue Mar 12

Review

Thu Mar 14

In-Class Midterm

Tue Mar 19 Thu Mar 21

SPRING BREAK

Tue Mar 26 Thu Mar 28 [week 10 notes]

Partially observed LQG

review: fully observed LQR, the Kalman filter
state estimation in the presence of control: information equivalence, linear dynamics of the information state
partially observed LQG problem, finite-horizon and infinite horizon
separation between estimation and control, the dual effect

Markov Decision Processes in continuous time

Tue Apr 2 Thu Apr 4 [week 11 notes]

Continuous-time Markov processes

abstract definition, transition kernels, Chapman-Kolmogorov equation, time-homogeneous processes
finite-state continuous-time Markov processes

pure-jump processes, transition intensities and Q-matrices
forward Kolmogorov equation
a construction using Poisson processes: uniformization

diffusion processes

definition: local drift coefficient and diffusion matrix
infinitesimal generator: second-order diffusion operator
a construction using an Euler scheme and Brownian motion, conditions for well-posedness of the forward Kolmogorov equation

Tue Apr 9 Thu Apr 11 [week 12 notes]

Ito calculus and controlled diffusion processes

Filtrations, adapted processes, stochastic integral in the sense of K. Ito
Ito's lemma
Controlled diffusion processes: expected cost, cost-to-go, value function
Necessary condition for optimality, the Hamilton-Jacobi-Bellman equation

Tue Apr 16 Thu Apr 18 [week 13 notes]

Markov decision processes in continuous time: examples

The linear quadratic regulator in continuous time: HJB equation and Riccati equation, optimali policy, value function
Controlled continuous-time finite-state Markov processes

Tue Apr 23 Mon Apr 29 [week 14 notes]

Nonlinear filtering in continuous time

The linear quadratic Gaussian problem in continous time: the Kalman-Bucy filter
General nonlinear filtering in continuous time

the reference measure approach: Girsanov change of measure, the Kallianpur-Striebel formula
evolution of the nonlinear filter: unnormalized (Duncan-Mortensen-Zakai equation) and normalized (Kushner-Stratonovich equation)
finite-state signal: the Wonham filter