de | en

Mining Massive Datasets

Module IN2323

This Module is offered by TUM Department of Informatics.

This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.

Basic Information

IN2323 is a semester module in English language at Master’s level which is offered in summer semester.

This Module is included in the following catalogues within the study programs in physics.

  • Catalogue of non-physics elective courses
Total workloadContact hoursCredits (ECTS)
150 h 60 h 5 CP

Content, Learning Outcome and Preconditions


1. Introduction
* Machine Learning, Data Mining and Knowledge Discovery Process
* Applications, Tasks

2. High-Dimensional Data
* Hashing & Sketches
- Min-Hashing
- Locality Sensitive Hashing
* Dimensionality Reduction & Matrix Factorization
- Feature Selection & Random Projections
- Non-Negative Matrix Factorization and Extensions

3. Graphs / Networks
* Laws, Patterns and Generators
* Spectral Graph Theory
- Ranking (e.g., PageRank, HITS)
- Community Detection
* Probabilistic Models
- Stochastic Blockmodel (SBM)
- (Stochastic) Variational Inference
- Belief Propagation
* Representation Learning for Graphs
- Deep Learning for Graph Data
- (Unsupervised) Node Embeddings

4. Temporal Data & Streaming
* Sampling & Sketches
- Bloom Filter
- Counting Distinct Elements
- Estimating moments
* Kalman Filter

Learning Outcome

Upon successful completion of this module, students will be able to describe data mining and machine learning methods and their applicability for large datasets and complex object types. The students will get to know systems for processing massive datasets and they will understand concepts for scaling up data mining algorithms. Furthermore, the students will be able to understand, apply, and evaluate principles for analyzing complex data such as graphs, network data, and temporal data.


Core modules from the Bachelor’s Informatics, semester 1-4

Courses, Learning and Teaching Methods and Literature

Courses and Schedule

VI 4 Mining Massive Datasets (IN2323) Günnemann, S.
Assistants: Bojchevski, A.Shchur, O.
Wed, 14:00–16:00, Interims I 101
Thu, 14:00–16:00, Interims I 101

Learning and Teaching Methods

Lecture, problems for individual study, assignments including project work


Slides, exercise sheets, white board, project work


- Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman. Cambridge University Press. 2014
- Data Mining: The Textbook. Charu Aggarwal. Springer. 2015
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani, Jerome Friedman. Springer. 2011

Module Exam

Description of exams and course work

The academic assessment will be done by a written 90 minutes exam. Assignments checking knowledge verify the familiarity with data mining/machine learning models and algorithms as well as with systems for large-scale analysis; programming assignments verify the ability to implement and critically evaluate advanced algorithms and methods for the analysis of massive data; small scenarios with defined applications have to be set up by applying the learnt methods to verify the ability to develop precise partial solutions.

Exam Repetition

The exam may be repeated at the end of the semester.

Current exam dates

Currently TUMonline lists the following exam dates. In addition to the general information above please refer to the current information given during the course.

Mining Massive Datasets
Fr, 9.8.2019, 13:30 bis 15:00 MW: 2001
PH: 2502
Import bis 30.6.2019 (Abmeldung bis 2.8.2019)
Top of page