Statistical Modeling and Machine Learning

Modul IN2332

Dieses Modul wird durch Fakultät für Informatik bereitgestellt.

Diese Modulbeschreibung enthält neben den eigentlichen Beschreibungen der Inhalte, Lernergebnisse, Lehr- und Lernmethoden und Prüfungsformen auch Verweise auf die aktuellen Lehrveranstaltungen und Termine für die Modulprüfung in den jeweiligen Abschnitten.

Basisdaten

IN2332 ist ein Semestermodul in Englisch auf Master-Niveau das im Sommersemester angeboten wird.

Das Modul ist Bestandteil der folgenden Kataloge in den Studienangeboten der Physik.

  • Allgemeiner Katalog der nichtphysikalischen Wahlfächer
GesamtaufwandPräsenzveranstaltungenUmfang (ECTS)
240 h 120 h 8 CP

Inhalte, Lernergebnisse und Voraussetzungen

Inhalt

0. Univariate and simple multivariate calculus and summary of linear algebra with intuitive explanations 1. Concepts in machine learning: supervised vs. unsupervised learning, classification vs. regression, overfitting, curse of dimensionality 2. Probability theory, Bayes theorem, conditional independence, distributions (multinomial, Poisson, Gaussian, gamma, beta,...), central limit theorem, entropy, mutual information 3. Generative models for discrete data: likelihood, prior, posterior, Dirichlet-multinomial model, naive Bayes classifiers 4. Gaussian models: max likelihood estimation, linear discriminant analysis, linear Gaussian systems 5. Bayesian statistics: max posterior estimation, model selection, uninformative and robust priors, hierarchical and empirical Bayes, Bayesian decision theory 6. Frequentist statistics: Bootstrap, Statistical testing 7. Linear regression: Ordinary Least Square, Robust linear regression, Ridge Regression, Bayesian Linear Regression 8. Logistic regression and optimization: (Bayesian) logistic regression, optimization, L2-regularization, Laplace approximation, Bayesian information criterion 9. Generalized Linear Models: the exponential family, Probit regression 10. Expectation Maximization (EM) algorithm with applications 11. Latent linear models: Principle Component Anlaysis, Bayesian PCA

Lernergebnisse

At the end of the module students are able to: - 1. remember the concepts of supervised and unsupervised learning and to implement cross-validation procedures - 2. remember the concepts of Bayesian probabilities, of conditional and unconditional dependences - 3. derive mathematically the models and inference procedures of Bayesian linear regression, Generalized linear models, Bayesian Principal Component Analysis, and k-means. - 4. identify use cases of the above mentioned models - 5. apply the above mentioned models using the R programming language - 6. assess the performance and significance of their results - 7. develop simple novel Bayesian models and inference procedure thereof for situations for which the above mentioned models do not apply.

Voraussetzungen

Linear algebra and multivariate calculus

Lehrveranstaltungen, Lern- und Lehrmethoden und Literaturhinweise

Lehrveranstaltungen und Termine

ArtSWSTitelDozent(en)Termine
VO 4 Statistical Modeling and Machine Learning (IN2332) Dienstag, 14:00–17:00
UE 4 Exercise Statistical Modeling and Machine Learning (IN2332) Freitag, 12:30–15:30

Lern- und Lehrmethoden

The class will be based on Christopher Bishop's book "Pattern Recognition and Machine Learning". The lecture will be held in inverted classroom style: Each week, we will give a ~30 min overview of the next reading assignment of a section of the book, pointing out the essential messages, thus facilitating the reading at home. Exercises to solve until next lecture will be given, including mathematical derivations of some book results. In the next lecture, the exercises will be discussed (~30 min), as well as questions and difficulties with the material are answered (~20 min). Then, practical exercises using the newly acquired material will be solved in teams, using the R statistics framework (100min). Further exercises will be performed during the Friday classes (3 hours) in smaller groups. The inverted classroom style is in our experience better suited than the conventional lecturing model for quantitative topics that require the students to think through or retrace mathematical derivations at their own speed.

Medienformen

Weekly posted exercises (math and programming) online, slides, chalk board, live demo

Literatur

Pattern recognition and Machine Learning by Christopher Bishop

Modulprüfung

Beschreibung der Prüfungs- und Studienleistungen

The learning outcomes are assessed by a final exam. The final exam is a 2 hours written exam. It includes knowledge questions (learning outcomes 1,2,4) and statistical modeling questions (derivation of the likelihood and of the inference procedure of a model not seen during the class, learning outcomes 3,7), a bit of R programming (learning outcome 5), and interpretation of results (learning outcome 6).

Wiederholbarkeit

Eine Wiederholungsmöglichkeit wird im Folgesemester angeboten.

Kondensierte Materie

Wenn Atome sich zusammen tun, wird es interessant: Grundlagenforschung an Festkörperelementen, Nanostrukturen und neuen Materialien mit überraschenden Eigenschaften treffen auf innovative Anwendungen.

Kern-, Teilchen-, Astrophysik

Ziel der Forschung ist das Verständnis unserer Welt auf subatomarem Niveau, von den Atomkernen im Zentrum der Atome bis hin zu den elementarsten Bausteinen unserer Welt.

Biophysik

Biologische Systeme, vom Protein bis hin zu lebenden Zellen und deren Verbänden, gehorchen physikalischen Prinzipien. Unser Forschungsbereich Biophysik ist deutschlandweit einer der größten Zusammenschlüsse in diesem Bereich.