Optimal Transport (From the classical Wasserstein distance to multimarginal problems in physics and machine learning)
Module MA5934
This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.
Basic Information
MA5934 is a semester module in English language at Master’s level which is offered irregular.
This Module is included in the following catalogues within the study programs in physics.
 Catalogue of nonphysics elective courses
Total workload  Contact hours  Credits (ECTS) 

270 h  90 h  9 CP 
Content, Learning Outcome and Preconditions
Content
The mathematical theory of optimal transport (OT) is undergoing rapid development and the classical introductory reference by Villani [1] is fast becoming out of date. Current interest is shifting to highdimensional multimarginal problems as arising in economics, fluid dynamics, manyparticle physics, machine learning. In this (4h lectures + 2h class) course I will develop the basic theory from scratch, and discuss carefully selected applications from all of the above fields. Of course this is a math course and I do not assume prior familiarity with these fields  I will cover what you need to know to appreciate how OT is being used there.
Questions I will address include:
 How does one get from Monge's problem of how to efficiently transport a pile of sand into a hole to the celebrated Wasserstein distance on probability measures?
 Why does this distance metrize weak convergence?
 Why is it far more effective at modern machine learning tasks like pattern recognition than traditional distances between density profiles from your undergraduate courses, like L^p?
 How and why did Kantorovich modify Monge's problem into an (infinitedimensional) linear optimization problem, and how does this allow to bring to bear powerful methods of linear (functional) analysis like weak convergence and duality?
 Why are contemporary pattern recognition and manyparticle physics applications of optimal transport impeded by the 'curse of dimension'? (Rough answer: you are dealing with Nmarginal problems, where N is the number of patterns in your database respectively the number of particles; Kantorovich requires you to find a ''joint probability density'' on the product of the spaces on which the marginals live; but even if marginals are crudely discretized by their values on 10 gridpoints, joint probability densities on the N fold product space would require 10^N gridpoint values!)
 What is the state of the art in trying to overcome the curse of dimension (e.g., via convex geometry)?
Questions I will address include:
 How does one get from Monge's problem of how to efficiently transport a pile of sand into a hole to the celebrated Wasserstein distance on probability measures?
 Why does this distance metrize weak convergence?
 Why is it far more effective at modern machine learning tasks like pattern recognition than traditional distances between density profiles from your undergraduate courses, like L^p?
 How and why did Kantorovich modify Monge's problem into an (infinitedimensional) linear optimization problem, and how does this allow to bring to bear powerful methods of linear (functional) analysis like weak convergence and duality?
 Why are contemporary pattern recognition and manyparticle physics applications of optimal transport impeded by the 'curse of dimension'? (Rough answer: you are dealing with Nmarginal problems, where N is the number of patterns in your database respectively the number of particles; Kantorovich requires you to find a ''joint probability density'' on the product of the spaces on which the marginals live; but even if marginals are crudely discretized by their values on 10 gridpoints, joint probability densities on the N fold product space would require 10^N gridpoint values!)
 What is the state of the art in trying to overcome the curse of dimension (e.g., via convex geometry)?
Learning Outcome
After following the course, you have understood basic concepts concerning the timely topic of optimal transport. You have gained an ability to interpret some problems in economics, physics, and machine learning from a mathematical perspective. You have learned interesting and transferrable mathematical methods in discrete optimization, infnitedimensional linear optimization, duality, convex geometry, functional analysis, calculus of variations, and partial differential equations.
Preconditions
Some familiarity with measure theory and functional analysis (as
provided by courses MA2003 and MA3001) will be helpful.
provided by courses MA2003 and MA3001) will be helpful.
Courses, Learning and Teaching Methods and Literature
Courses and Schedule
Type  SWS  Title  Lecturer(s)  Dates  Links 

VO  4  Optimal Transport [MA5934]  Friesecke, G. Penka, M. 
Tue, 10:15–11:45, MI 00.09.022 Fri, 10:15–11:45, MI 00.09.022 and singular or moved dates 

UE  2  Optimal Transport (Exercise Session)[MA5934]  Friesecke, G. Penka, M. 
Mon, 12:15–13:45, MI 00.09.022 
Learning and Teaching Methods
In lectures, I will introduce the relevant concepts as well as basic examples on the blackboard and discuss the material with the students. The lectures should motivate and enable students to independently study appropriate parts of the relevant literature. There will be weekly practice sessions in which the students confirm and deepen their understanding of the concepts and methods and check their progress by working  individually and/or in small groups  through assigned problems. Live guidance and feedback is available when needed; but as term goes on students are expected to carry out the assigned work in a more and more independent manner.
Media
Blackboard (during lectures); typed notes uploaded after each lecture (for selfstudy afterwards); beamer (for occasional work on the computer)
Literature
My presentation of the material will make two major changes compared to the texts [1] and [2]. First, I will begin by discussing optimal transport problems as they appear after numerical discretization, namely in finite dimensions. This is easier to understand, yet far from uninteresting, as key phenomena already occur in this setting and some ideas for their analysis can be understood without being impeded by infinitedimensional technicalities. Second, I will use a multimarginal perspective from the outset. This does not really complicate matters, and provides a unified mathematical setting covering all the main applications.
[1] C. Villani, Topics in Optimal Transportation, AMS 2003
[2] F. Santambrogio, Optimal Transport for Applied Mathematicians, Birkhäuser 2015
[1] C. Villani, Topics in Optimal Transportation, AMS 2003
[2] F. Santambrogio, Optimal Transport for Applied Mathematicians, Birkhäuser 2015
Module Exam
Description of exams and course work
Oral exam (25 minutes). Students demonstrate that they have gained an understanding of key concepts and results of optimal transport theory, as well as of underlying mathematical methods, and can apply them to specific examples.
Exam Repetition
The exam may be repeated at the end of the semester.