This website is no longer updated.

As of 1.10.2022, the Faculty of Physics has been merged into the TUM School of Natural Sciences with the website https://www.nat.tum.de/. For more information read Conversion of Websites.

de | en

Mining Massive Datasets

Module IN2323

This Module is offered by TUM Department of Informatics.

This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.

Module version of SS 2015

There are historic module descriptions of this module. A module description is valid until replaced by a newer one.

Whether the module’s courses are offered during a specific semester is listed in the section Courses, Learning and Teaching Methods and Literature below.

available module versions
WS 2019/20SS 2015

Basic Information

IN2323 is a semester module in English language at Master’s level which is offered in summer semester.

This module description is valid to SS 2019.

Total workloadContact hoursCredits (ECTS)
150 h 60 h 5 CP

Content, Learning Outcome and Preconditions

Content

1. Introduction
* Machine Learning, Data Mining and Knowledge Discovery Process
* Applications, Tasks

2. High-Dimensional Data
* Hashing & Sketches
- Min-Hashing
- Locality Sensitive Hashing
* Dimensionality Reduction & Matrix Factorization
- Feature Selection & Random Projections
- Non-Negative Matrix Factorization and Extensions

3. Graphs / Networks
* Laws, Patterns and Generators
* Spectral Graph Theory
- Ranking (e.g., PageRank, HITS)
- Community Detection
* Probabilistic Models
- Stochastic Blockmodel (SBM)
- (Stochastic) Variational Inference
- Belief Propagation
* Representation Learning for Graphs
- Deep Learning for Graph Data
- (Unsupervised) Node Embeddings

4. Temporal Data & Streaming
* Sampling & Sketches
- Bloom Filter
- Counting Distinct Elements
- Estimating moments
* Kalman Filter

Learning Outcome

Upon successful completion of this module, students will be able to describe data mining and machine learning methods and their applicability for large datasets and complex object types. The students will get to know systems for processing massive datasets and they will understand concepts for scaling up data mining algorithms. Furthermore, the students will be able to understand, apply, and evaluate principles for analyzing complex data such as graphs, network data, and temporal data.

Preconditions

Core modules from the Bachelor’s Informatics, semester 1-4

Courses, Learning and Teaching Methods and Literature

Courses and Schedule

Learning and Teaching Methods

Lecture, problems for individual study, assignments including project work

Media

Slides, exercise sheets, white board, project work

Literature

- Mining of Massive Datasets. Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman. Cambridge University Press. 2014
- Data Mining: The Textbook. Charu Aggarwal. Springer. 2015
- The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Trevor Hastie, Robert Tibshirani, Jerome Friedman. Springer. 2011

Module Exam

Description of exams and course work

The academic assessment will be done by a written 90 minutes exam. Assignments checking knowledge verify the familiarity with data mining/machine learning models and algorithms as well as with systems for large-scale analysis; programming assignments verify the ability to implement and critically evaluate advanced algorithms and methods for the analysis of massive data; small scenarios with defined applications have to be set up by applying the learnt methods to verify the ability to develop precise partial solutions.

Exam Repetition

The exam may be repeated at the end of the semester.

Top of page