de | en

Essential Machine Learning for Physicists

Module NAT3009

This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.

Basic Information

NAT3009 is a semester module in language at which is offered irregularly.

This Module is included in the following catalogues within the study programs in physics.

  • Specific catalogue of special courses for condensed matter physics
  • Specific catalogue of special courses for nuclear, particle, and astrophysics
  • Specific catalogue of special courses for Biophysics
  • Specific catalogue of special courses for Applied and Engineering Physics

If not stated otherwise for export to a non-physics program the student workload is given in the following table.

Total workloadContact hoursCredits (ECTS)
150 h 60 h 5 CP

Responsible coordinator of the module NAT3009 is Zinonas Zinonos.

Content, Learning Outcome and Preconditions

Content

Machine Learning (ML) is the new electricity of the 21st century's technological revolution and is overwhelming most of the applications accompanying us in our lives, from spam email rejection to face recognition. ML is also the critical factor for tremendous scientific accomplishments, without which it would be unthinkable to achieve them by compatible means.  Recent scientific attainments include the Higgs boson discovery in proton collisions and the detection of gravitational waves from pulsars in binary star systems.

This course is designed to be the physicists' most complete recourse for learning how stuff works in ML and how this can be applied to physics research problems involving big scientific data. We will start off with the basics, learning how to cope with and manipulate big datasets using the NumPy and Pandas libraries with Python. Then we will dive deeper into working with data by learning about visualizations with the Pandas, Matplotlib, and Seaborn libraries.

Afterward, we will get into the heart of the course, covering general prediction models. We will begin with the basics of Regression, using classic statistical approaches with SciPy and StatModels. Then will head over to utilizing the world's most famous suite for Machine Learning called Scikit-Learn, as well as other state-of-the-art libraries such as XGBoost, LightGBM, and CatBoost. We will learn how to automatically build ML systems that can combine numerous features together and provide reliable predictions such as weather forecasting or air quality prediction.

Then we will move on to understanding ML with these libraries to conduct supervised and complex tasks of Classification, such as identifying physics signals from background events with high accuracy. We will expand this knowledge to more complex supervised learning methods for imbalanced classification problems, such as the detection of very rare phenomena, where our machine learning models will detect patterns and major characteristics from lakes of data.   During the course, we will understand fundamental ML concepts such as the dimensionality curse, feature selection for building robust models, model underfitting and overfitting, model metrics, and data leakage. You will also learn efficient techniques how to optimize learning algorithms, how to evaluate your trained models, and how to cross-validate them through a variety of methods.

Moreover, the course is packed with practical exercises that are based on physics and real-life examples. So not only will you learn the theory, but you will also get essential hands-on practice building your own models.   At the end of the course, physics students will acquire a thorough understanding of ML concepts as well as all skills needed to apply ML principles to challenging physics problems with real data.

So, what are you waiting for? Get enrolled now and become a real machine-learning professional in physics!

Learning Outcome

In this course, you will be walked step-by-step into the world of Machine Learning with applications in Physics. With every lecture and tutorial, you will develop new skills and improve your understanding of this challenging yet emerging field of Data Science.

This course is fun and exciting, but at the same time, we dive deep into the grounds of Machine Learning. The content is structured the following way:

Part 1: Data Management & Data Visualization

  • NumPy
  • Pandas Dataframes
  • Data visualization with python libraries (Matplotlib, Seaborn)

Part 2: Data Preprocessing & Feature Engineering

  • Handle missing data
  • Encode categorical (nominal and ordinal) data
  • Handle outliers
  • Feature scaling
  • Data partitioning into train and test samples
  • Imputation of missing class values 

Part 3: Regression

  • Simple Linear Regression
  • Multiple Linear Regression
  • Polynomial Regression
  • Support Vector Regression
  • Decision Tree Regression
  • Random Forest Regression

Part 4: Regression Project

  • Train a regression model on big data and make predictions
  • Control model overfitting & underfitting
  • Regression metrics
  • Feature selection
  • Model optimization with hyperparameter grid search 
  • Cross-validation
  • Model evaluation
  • Full model training and deployment to make predictions

Part 5: Classification

  • Logistic Regression
  • Support Vector Machines
  • Kernel SVM
  • Naive Bayes
  • Decision Tree Classification
  • Random Forest Classification
  • Boosting methods

Part 6: Classification Project

  • Train a classification model on big data and make predictions
  • Control model overfitting & underfitting
  • Control data label imbalancing
  • Classification metrics
  • Feature selection
  • Model optimization with hyperparameter grid search 
  • K-fold cross-validation
  • Model evaluation
  • Full model training and deployment to make predictions

Part 7: State-of-art machine learning libraries

  • XGBoost
  • CatBoost
  • LightGBM Microsoft

Part 8: Dimensionality Reduction

  • Principle Component Analysis
  • Linear Discriminant Analysis
  • Kernel PCA
Part 9: Clustering
  • k-Means Clustering
  • Hierarchical Clustering
  • Density-Based Spatial Clustering

Part 10: Model Deployment

  • Model persistence
  • Model API

Preconditions

No preconditions in addition to the requirements for the Master’s program in Physics.

For this course, we will simply need a computer supporting Python, Jupyter notebooks, and GitHub.

Courses, Learning and Teaching Methods and Literature

Courses and Schedule

TypeSWSTitleLecturer(s)DatesLinks
VO 2 Essential Machine Learning for Physicists Zinonos, Z. singular or moved dates
documents
UE 2 Exercise to Essential Machine Learning for Physicists Hessler, J.
Responsible/Coordination: Zinonos, Z.
singular or moved dates

Learning and Teaching Methods

The course content will be interactively presented over Jupyter Notebooks and shared in the classroom. This course balances theory and practical implementation, with complete Jupyter notebook guides of code and easy-to-reference notes. We also have plenty of exercises to improve your new skills along the way! At the end of the course, you will become proficient in the following areas:

  • Python Programming
  • Data manipulation and visualization with famous Python libraries
  • Numerical processing with Python libraries
  • Principles of Machine Learning
  • Supervised Machine Learning; Regression and Classification
  • Build efficient Machine Learning Systems and solve physics problems with data

Media

The lectures, as well as the exercises, will be delivered on a web-based interactive computing platform known as Jupyter notebooks.

The material will be versioned and distributed over GitHub.

Literature

Links to Jupyter Notebooks over GitHub repositories will be shared during the lectures and labs. References to further online reading material will be distributed throughout the lectures.


Recommended textbooks:

  1. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani, and Jerome Friedman
  2. Applied Predictive Modeling. Max Kuhn

  3. Introduction to Machine Learning with Python: A Guide for Data Scientists. Andreas C. Müller and Sarah Guido

  4. The Hundred-Page Machine Learning Book. Andriy Burkov

  5. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow. Sebastian Raschka and Vahid Mirjalili

  6. Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Aurélien Géron

  7. Understanding Machine Learning: From Theory to Algorithms. Shai Shalev-Shwartz, Shai Ben-David

  8. Machine Learning. Tom M. Mitchell

Module Exam

Description of exams and course work

There will be a written exam of 60 minutes duration. Therein the achievement of the competencies given in section learning outcome is tested exemplarily at least to the given cognition level using comprehension questions and sample calculations.

For example an assignment in the exam might be:

  • Explanation of regression and classification metrics.
  • How do classification models work?
  • How do regression models work?
  • How do ensemble methods work?
  • How does boosting in decision trees work?
  • How to optimize a learning model?
  • How to control overfitting and underfitting of learning models?
  • Which are the main data preprocessing practices?

Exam Repetition

The exam may be repeated at the end of the semester.

Current exam dates

Currently TUMonline lists the following exam dates. In addition to the general information above please refer to the current information given during the course.

Title
TimeLocationInfoRegistration
Exam to Essential Machine Learning for Physicists
Wed, 2023-02-22, 12:00 till 13:00 019
019
till 2023-01-15 (cancelation of registration till 2023-02-15)
Tue, 2023-03-28, 12:00 till 13:00 019
019
till 2023-03-21
Top of page