# Introduction to Data Analysis

## Module PH2099

This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.

### Module version of SS 2020

There are historic module descriptions of this module. A module description is valid until replaced by a newer one.

Whether the module’s courses are offered during a specific semester is listed in the section Courses, Learning and Teaching Methods and Literature below.

available module versions | |||||
---|---|---|---|---|---|

SS 2021 | SS 2020 | SS 2019 | SS 2018 | WS 2016/7 | WS 2010/1 |

### Basic Information

PH2099 is a semester module in German or English language at Master’s level which is offered in winter semester.

This Module is included in the following catalogues within the study programs in physics.

- Specific catalogue of special courses for nuclear, particle, and astrophysics
- Complementary catalogue of special courses for condensed matter physics
- Complementary catalogue of special courses for Biophysics
- Complementary catalogue of special courses for Applied and Engineering Physics

If not stated otherwise for export to a non-physics program the student workload is given in the following table.

Total workload | Contact hours | Credits (ECTS) |
---|---|---|

150 h | 60 h | 5 CP |

Responsible coordinator of the module PH2099 in the version of SS 2020 was Boris Grube.

### Content, Learning Outcome and Preconditions

#### Content

The module will give an introduction into the basic techniques for the analysis of experimental data. It will cover among other things the following topics:

- The scientific method
- The concept of probability and its interpretations
- Bayes' theorem
- Random variables
- Probability distributions and their moments
- Important distributions: binomial, multinomial, Poisson and Gaussian distribution
- Multivariate distributions
- Marginal and conditional probability distributions
- Covariance and correlation coefficient
- Functions of (multiple) random variables
- Central limit theorem
- Gaussian uncertainty propagation for n-dimensional functions and covariance matrix
- Statistical and systematic uncertainties
- Parameter estimation using the method of least squares
- Estimating the goodness of fit
- Parameter estimation using the (extended) maximum-likelihood method
- Relation between least-squares and maximum-likelihood method
- Estimating the significance of a signal

#### Learning Outcome

After successful completion of this module, students are able to

- understand and apply fundamental statistical concepts
- understand and apply basic data-analysis techniques to suitable data
- apply first-order uncertainty propagation in the most general case
- estimate and correctly interpret statistical and systematic uncertainties
- estimate model parameters by performing fits to (multi-dimensional) data
- estimate the statistical significance of signals in the presence of background
- (when attending the tutorials) develop tools for moderately complex data-analysis tasks using the Python programming language

#### Preconditions

No preconditions in addition to the requirements for the Master’s program in Physics.

### Courses, Learning and Teaching Methods and Literature

#### Courses and Schedule

Type | SWS | Title | Lecturer(s) | Dates | Links |
---|---|---|---|---|---|

VO | 2 | Introduction to Data Analysis | Grube, B. |
Mon, 14:00–16:00 |
eLearning |

UE | 2 | Exercise to Introduction to Data Analysis | Grube, B. | dates in groups |
eLearning |

#### Learning and Teaching Methods

This module consists of a lecture and an excercise course.

The goal of the lecture is to provide a solid theoretical background. To this end, methods and concepts will be derived, where possible, from first principles.

In the tutorials, the concepts that are explained in the lecture will be applied to concrete examples. In small groups of students, short Python programs will be developed. The examples will be mostly from particle physics. However, the tutorials will focus mainly on the statistical aspects of the problems and are prepared such that they do not require deeper particle-physics preknowledge.

#### Media

Presentation with projector, board work, Smartboard, problem sheets

#### Literature

- G. Cowan:
*Statistical data analysis*, Oxford University Press, (1998) - R. J. Barlow:
*Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences*, Wiley, (2008) - S. Brandt:
*Datenanalyse für Naturwissenschaftler und Ingenieure*, Springer Spektrum, (2013) - B. Roe:
*Probability and Statistics in Experimental Physics*, Springer, (2001) - M.G. Kendall & A. Stuart:
*The Advanced Theory of Statistics Vol I-III*, Charles Griffin, (1961) - V. Blobel & E. Lohrmann:
*Statistische und numerische Methoden der Datenanalyse*, Teubner Studienbücher Verlag, (1998) - D.S. Sivia & J. Skilling:
*Data Analysis: a Bayesian Tutorial*, Oxford University Press, (2006) - P.R. Bevington & D.K. Robinson:
*Data Reduction and Error Analysis for the Physical Sciences*, McGraw-Hill, (2002) - L. Lyons:
*Statistics for Nuclear and Particle Physicists*, Cambridge University Press, (1989)

### Module Exam

#### Description of exams and course work

There will be an oral exam of 30 minutes duration. Therein the achievement of the competencies given in section learning outcome is tested exemplarily at least to the given cognition level using comprehension questions.

For example an assignment in the exam might be:

- What is the mathematical definition of probability?
- What is the method of least squares?
- What is the method of maximum likelihood estimation?

Participation in the exercise classes is strongly recommended since the exercises prepare for the problems of the exam and rehearse the specific competencies.

#### Exam Repetition

The exam may be repeated at the end of the semester.