de | en

# Statistical Modeling and Machine Learning

## Module IN2332

This Module is offered by TUM Department of Informatics.

This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.

### Basic Information

IN2332 is a semester module in English language at Master’s level which is offered in summer semester.

This module description is valid to WS 2020/1.

240 h 120 h 8 CP

### Content, Learning Outcome and Preconditions

#### Content

0. Univariate and simple multivariate calculus and summary of linear algebra with intuitive explanations
1. Concepts in machine learning: supervised vs. unsupervised learning, classification vs. regression, overfitting, curse of dimensionality
2. Probability theory, Bayes theorem, conditional independence, distributions (multinomial, Poisson, Gaussian, gamma, beta,...), central limit theorem, entropy, mutual information
3. Generative models for discrete data: likelihood, prior, posterior, Dirichlet-multinomial model, naive Bayes classifiers
4. Gaussian models: max likelihood estimation, linear discriminant analysis, linear Gaussian systems
5. Bayesian statistics: max posterior estimation, model selection, uninformative and robust priors, hierarchical and empirical Bayes, Bayesian decision theory
6. Frequentist statistics: Bootstrap, Statistical testing
7. Linear regression: Ordinary Least Square, Robust linear regression, Ridge Regression, Bayesian Linear Regression
8. Logistic regression and optimization: (Bayesian) logistic regression, optimization, L2-regularization, Laplace approximation, Bayesian information criterion
9. Generalized Linear Models: the exponential family, Probit regression
10. Expectation Maximization (EM) algorithm with applications
11. Latent linear models: Principle Component Anlaysis, Bayesian PCA

#### Learning Outcome

At the end of the module students are able to:
- 1. remember the concepts of supervised and unsupervised learning and to implement cross-validation procedures
- 2. remember the concepts of Bayesian probabilities, of conditional and unconditional dependences
- 3. derive mathematically the models and inference procedures of Bayesian linear regression, Generalized linear models, Bayesian Principal Component Analysis, and k-means.
- 4. identify use cases of the above mentioned models
- 5. apply the above mentioned models using the R programming language
- 6. assess the performance and significance of their results
- 7. develop simple novel Bayesian models and inference procedure thereof for situations for which the above mentioned models do not apply.

#### Preconditions

Linear algebra and multivariate calculus

### Courses, Learning and Teaching Methods and Literature

#### Learning and Teaching Methods

The class will be based on Christopher Bishop's book "Pattern Recognition and Machine Learning". The lecture will be held in inverted classroom style: Each week, we will give a ~30 min overview of the next reading assignment of a section of the book, pointing out the essential messages, thus facilitating the reading at home. Exercises to solve until next lecture will be given, including mathematical derivations of some book results. In the next lecture, the exercises will be discussed (~30 min), as well as questions and difficulties with the material are answered (~20 min). Then, practical exercises using the newly acquired material will be solved in teams, using the R statistics framework (100min). Further exercises will be performed during the Friday classes (3 hours) in smaller groups. The inverted classroom style is in our experience better suited than the conventional lecturing model for quantitative topics that require the students to think through or retrace mathematical derivations at their own speed.

#### Media

Weekly posted exercises (math and programming) online, slides, chalk board, live demo

#### Literature

Pattern recognition and Machine Learning by Christopher Bishop

### Module Exam

#### Description of exams and course work

The learning outcomes are assessed by a final exam. The final exam is a 2 hours written exam. It includes knowledge questions (learning outcomes 1,2,4) and statistical modeling questions (derivation of the likelihood and of the inference procedure of a model not seen during the class, learning outcomes 3,7), a bit of R programming (learning outcome 5), and interpretation of results (learning outcome 6).

#### Exam Repetition

There is a possibility to take the exam in the following semester.

Top of page