Essential Machine Learning for Physicists
Module NAT3009
Basic Information
NAT3009 is a semester module in language at which is offered irregularly.
This Module is included in the following catalogues within the study programs in physics.
- Specific catalogue of special courses for condensed matter physics
- Specific catalogue of special courses for nuclear, particle, and astrophysics
- Specific catalogue of special courses for Biophysics
- Specific catalogue of special courses for Applied and Engineering Physics
If not stated otherwise for export to a non-physics program the student workload is given in the following table.
Total workload | Contact hours | Credits (ECTS) |
---|---|---|
150 h | 60 h | 5 CP |
Responsible coordinator of the module NAT3009 is Zinonas Zinonos.
Content, Learning Outcome and Preconditions
Content
Machine Learning (ML) is the new electricity of the 21st century's technological revolution and is overwhelming most of the applications accompanying us in our lives, from spam email rejection to face recognition. ML is also the critical factor for tremendous scientific accomplishments, without which it would be unthinkable to achieve them by compatible means. Recent scientific attainments include the Higgs boson discovery in proton collisions and the detection of gravitational waves from pulsars in binary star systems.
This course is designed to be the physicists' most complete recourse for learning how stuff works in ML and how this can be applied to physics research problems involving big scientific data. We will start off with the basics, learning how to cope with and manipulate big datasets using the NumPy and Pandas libraries with Python. Then we will dive deeper into working with data by learning about visualizations with the Pandas, Matplotlib, and Seaborn libraries.
Afterward, we will get into the heart of the course, covering general prediction models. We will begin with the basics of Regression, using classic statistical approaches with SciPy and StatModels. Then will head over to utilizing the world's most famous suite for Machine Learning called Scikit-Learn, as well as other state-of-the-art libraries such as XGBoost, LightGBM, and CatBoost. We will learn how to automatically build ML systems that can combine numerous features together and provide reliable predictions such as weather forecasting or air quality prediction.
Then we will move on to understanding ML with these libraries to conduct supervised and complex tasks of Classification, such as identifying physics signals from background events with high accuracy. We will expand this knowledge to more complex supervised learning methods for imbalanced classification problems, such as the detection of very rare phenomena, where our machine learning models will detect patterns and major characteristics from lakes of data. During the course, we will understand fundamental ML concepts such as the dimensionality curse, feature selection for building robust models, model underfitting and overfitting, model metrics, and data leakage. You will also learn efficient techniques how to optimize learning algorithms, how to evaluate your trained models, and how to cross-validate them through a variety of methods.
Moreover, the course is packed with practical exercises that are based on physics and real-life examples. So not only will you learn the theory, but you will also get essential hands-on practice building your own models. At the end of the course, physics students will acquire a thorough understanding of ML concepts as well as all skills needed to apply ML principles to challenging physics problems with real data.
So, what are you waiting for? Get enrolled now and become a real machine-learning professional in physics!
Learning Outcome
In this course, you will be walked step-by-step into the world of Machine Learning with applications in Physics. With every lecture and tutorial, you will develop new skills and improve your understanding of this challenging yet emerging field of Data Science.
This course is fun and exciting, but at the same time, we dive deep into the grounds of Machine Learning. The content is structured the following way:
Part 1: Data Management & Data Visualization
- NumPy
- Pandas Dataframes
- Data visualization with python libraries (Matplotlib, Seaborn)
Part 2: Data Preprocessing & Feature Engineering
- Handle missing data
- Encode categorical (nominal and ordinal) data
- Handle outliers
- Feature scaling
- Data partitioning into train and test samples
- Imputation of missing class values
Part 3: Regression
- Simple Linear Regression
- Multiple Linear Regression
- Polynomial Regression
- Support Vector Regression
- Decision Tree Regression
- Random Forest Regression
Part 4: Regression Project
- Train a regression model on big data and make predictions
- Control model overfitting & underfitting
- Regression metrics
- Feature selection
- Model optimization with hyperparameter grid search
- Cross-validation
- Model evaluation
- Full model training and deployment to make predictions
Part 5: Classification
- Logistic Regression
- Support Vector Machines
- Kernel SVM
- Naive Bayes
- Decision Tree Classification
- Random Forest Classification
- Boosting methods
Part 6: Classification Project
- Train a classification model on big data and make predictions
- Control model overfitting & underfitting
- Control data label imbalancing
- Classification metrics
- Feature selection
- Model optimization with hyperparameter grid search
- K-fold cross-validation
- Model evaluation
- Full model training and deployment to make predictions
Part 7: State-of-art machine learning libraries
- XGBoost
- CatBoost
- LightGBM Microsoft
Part 8: Dimensionality Reduction
- Principle Component Analysis
- Linear Discriminant Analysis
- Kernel PCA
- k-Means Clustering
- Hierarchical Clustering
- Density-Based Spatial Clustering
Part 10: Model Deployment
- Model persistence
- Model API
Preconditions
No preconditions in addition to the requirements for the Master’s program in Physics.
For this course, we will simply need a computer supporting Python, Jupyter notebooks, and GitHub.
Courses, Learning and Teaching Methods and Literature
Courses and Schedule
Type | SWS | Title | Lecturer(s) | Dates | Links |
---|---|---|---|---|---|
VO | 2 | Essential Machine Learning for Physicists | Zinonos, Z. |
Wed, 12:00–14:00, LMU-HS and singular or moved dates |
documents |
UE | 2 | Exercise to Essential Machine Learning for Physicists |
Hessler, J.
Responsible/Coordination: Zinonos, Z. |
Wed, 14:00–16:00, PH HS3 and singular or moved dates |
Learning and Teaching Methods
The course content will be interactively presented over Jupyter Notebooks and shared in the classroom. This course balances theory and practical implementation, with complete Jupyter notebook guides of code and easy-to-reference notes. We also have plenty of exercises to improve your new skills along the way! At the end of the course, you will become proficient in the following areas:
- Python Programming
- Data manipulation and visualization with famous Python libraries
- Numerical processing with Python libraries
- Principles of Machine Learning
- Supervised Machine Learning; Regression and Classification
- Build efficient Machine Learning Systems and solve physics problems with data
Media
The lectures, as well as the exercises, will be delivered on a web-based interactive computing platform known as Jupyter notebooks.
The material will be versioned and distributed over GitHub.
Literature
Links to Jupyter Notebooks over GitHub repositories will be shared during the lectures and labs. References to further online reading material will be distributed throughout the lectures.
Recommended textbooks:
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani, and Jerome Friedman
-
Applied Predictive Modeling. Max Kuhn
-
Introduction to Machine Learning with Python: A Guide for Data Scientists. Andreas C. Müller and Sarah Guido
-
The Hundred-Page Machine Learning Book. Andriy Burkov
-
Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow. Sebastian Raschka and Vahid Mirjalili
-
Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. Aurélien Géron
-
Understanding Machine Learning: From Theory to Algorithms. Shai Shalev-Shwartz, Shai Ben-David
-
Machine Learning. Tom M. Mitchell
Module Exam
Description of exams and course work
There will be a written exam of 60 minutes duration. Therein the achievement of the competencies given in section learning outcome is tested exemplarily at least to the given cognition level using comprehension questions and sample calculations.
For example an assignment in the exam might be:
- Explanation of regression and classification metrics.
- How do classification models work?
- How do regression models work?
- How do ensemble methods work?
- How does boosting in decision trees work?
- How to optimize a learning model?
- How to control overfitting and underfitting of learning models?
- Which are the main data preprocessing practices?
Exam Repetition
The exam may be repeated at the end of the semester.