# Optimal Transport (From the classical Wasserstein distance to multi-marginal problems in physics and machine learning)

## Module MA5934

This Module is offered by TUM Department of Mathematics.

This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.

### Basic Information

MA5934 is a semester module in English language at Master’s level which is offered irregular.

This Module is included in the following catalogues within the study programs in physics.

- Catalogue of non-physics elective courses

Total workload | Contact hours | Credits (ECTS) |
---|---|---|

270 h | 90 h | 9 CP |

### Content, Learning Outcome and Preconditions

#### Content

The mathematical theory of optimal transport (OT) is undergoing rapid development and the classical introductory reference by Villani [1] is fast becoming out of date. Current interest is shifting to high-dimensional multi-marginal problems as arising in economics, fluid dynamics, many-particle physics, machine learning. In this (4h lectures + 2h class) course I will develop the basic theory from scratch, and discuss carefully selected applications from all of the above fields. Of course this is a math course and I do not assume prior familiarity with these fields - I will cover what you need to know to appreciate how OT is being used there.

Questions I will address include:

- How does one get from Monge's problem of how to efficiently transport a pile of sand into a hole to the celebrated Wasserstein distance on probability measures?

- Why does this distance metrize weak convergence?

- Why is it far more effective at modern machine learning tasks like pattern recognition than traditional distances between density profiles from your undergraduate courses, like L^p?

- How and why did Kantorovich modify Monge's problem into an (infinite-dimensional) linear optimization problem, and how does this allow to bring to bear powerful methods of linear (functional) analysis like weak convergence and duality?

- Why are contemporary pattern recognition and many-particle physics applications of optimal transport impeded by the 'curse of dimension'? (Rough answer: you are dealing with N-marginal problems, where N is the number of patterns in your database respectively the number of particles; Kantorovich requires you to find a ''joint probability density'' on the product of the spaces on which the marginals live; but even if marginals are crudely discretized by their values on 10 gridpoints, joint probability densities on the N -fold product space would require 10^N gridpoint values!)

- What is the state of the art in trying to overcome the curse of dimension (e.g., via convex geometry)?

Questions I will address include:

- How does one get from Monge's problem of how to efficiently transport a pile of sand into a hole to the celebrated Wasserstein distance on probability measures?

- Why does this distance metrize weak convergence?

- Why is it far more effective at modern machine learning tasks like pattern recognition than traditional distances between density profiles from your undergraduate courses, like L^p?

- How and why did Kantorovich modify Monge's problem into an (infinite-dimensional) linear optimization problem, and how does this allow to bring to bear powerful methods of linear (functional) analysis like weak convergence and duality?

- Why are contemporary pattern recognition and many-particle physics applications of optimal transport impeded by the 'curse of dimension'? (Rough answer: you are dealing with N-marginal problems, where N is the number of patterns in your database respectively the number of particles; Kantorovich requires you to find a ''joint probability density'' on the product of the spaces on which the marginals live; but even if marginals are crudely discretized by their values on 10 gridpoints, joint probability densities on the N -fold product space would require 10^N gridpoint values!)

- What is the state of the art in trying to overcome the curse of dimension (e.g., via convex geometry)?

#### Learning Outcome

After following the course, you have understood basic concepts concerning the timely topic of optimal transport. You have gained an ability to interpret some problems in economics, physics, and machine learning from a mathematical perspective. You have learned interesting and transferrable mathematical methods in discrete optimization, infnite-dimensional linear optimization, duality, convex geometry, functional analysis, calculus of variations, and partial differential equations.

#### Preconditions

Some familiarity with measure theory and functional analysis (as

provided by courses MA2003 and MA3001) will be helpful.

provided by courses MA2003 and MA3001) will be helpful.

### Courses, Learning and Teaching Methods and Literature

#### Courses and Schedule

Type | SWS | Title | Lecturer(s) | Dates |
---|---|---|---|---|

VO | 4 | Optimal Transport [MA5934] | Friesecke, G. | |

UE | 2 | Optimal Transport (Exercise Session)[MA5934] | Friesecke, G. Vögler, D. |

#### Learning and Teaching Methods

In lectures, I will introduce the relevant concepts as well as basic examples on the blackboard and discuss the material with the students. The lectures should motivate and enable students to independently study appropriate parts of the relevant literature. There will be weekly practice sessions in which the students confirm and deepen their understanding of the concepts and methods and check their progress by working - individually and/or in small groups - through assigned problems. Live guidance and feedback is available when needed; but as term goes on students are expected to carry out the assigned work in a more and more independent manner.

#### Media

Blackboard (during lectures); typed notes uploaded after each lecture (for self-study afterwards); beamer (for occasional work on the computer)

#### Literature

My presentation of the material will make two major changes compared to the texts [1] and [2]. First, I will begin by discussing optimal transport problems as they appear after numerical discretization, namely in finite dimensions. This is easier to understand, yet far from uninteresting, as key phenomena already occur in this setting and some ideas for their analysis can be understood without being impeded by infinite-dimensional technicalities. Second, I will use a multi-marginal perspective from the outset. This does not really complicate matters, and provides a unified mathematical setting covering all the main applications.

[1] C. Villani, Topics in Optimal Transportation, AMS 2003

[2] F. Santambrogio, Optimal Transport for Applied Mathematicians, Birkhäuser 2015

[1] C. Villani, Topics in Optimal Transportation, AMS 2003

[2] F. Santambrogio, Optimal Transport for Applied Mathematicians, Birkhäuser 2015

### Module Exam

#### Description of exams and course work

Oral exam (25 minutes). Students demonstrate that they have gained an understanding of key concepts and results of optimal transport theory, as well as of underlying mathematical methods, and can apply them to specific examples.

#### Exam Repetition

The exam may be repeated at the end of the semester.