de | en

Cloud-Based Data Processing

Module IN2386

This Module is offered by TUM Department of Informatics.

This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.

Basic Information

IN2386 is a semester module in English language at Bachelor’s level and Master’s level which is offered in winter semester.

This Module is included in the following catalogues within the study programs in physics.

  • Focus Area Bio-Sensors in M.Sc. Biomedical Engineering and Medical Physics
  • Focus Area Imaging in M.Sc. Biomedical Engineering and Medical Physics
Total workloadContact hoursCredits (ECTS)
150 h 60 h 5 CP

Content, Learning Outcome and Preconditions

Content

Introduction
* Intro to Data-centers
* Latest trends in Cloud Computing

Fundamentals for managing distributed data
* Data replication and quorums
* Data partitioning
* Fault-tolerance and unreliable components
* Distributed system model
* Consensus protocols and coordination services
* Consistency models and distributed key value stores

Design of cloud-based data processing systems
* Distributed transactions
* Cloud-native OLTP databases
* Cloud-native data warehouses
* Dataflow computing (i.e., derived data)
* Cloud-scale data streaming systems
* Query-as-a-Service (QaaS) (serverless data processing)
* Resource management and scheduling

Other related topics:
* Novel data storage formats (e.g., data lakes, data-mash, etc.)
* Security and Privacy for data processing in the cloud
* Accelerators and impact of new hardware technology

Learning Outcome

Upon successful completion of this module, students are able to:
* define the requirements and challenges when architecting, building and managing a large-scale data processing service in the cloud.
* use the theoretical foundations of distributed algorithms to construct the building blocks for a scalable data system design in relation to distributed storage, coordination and computation.
* understand and analyze the different trade-offs when designing scalable data processing systems that need to run in the public cloud.
* design, implement and evaluate a three-tier system using current cloud technologies.
* identify the scalability bottlenecks and vulnerabilities of a complex computer system.

Preconditions

IN0008 Fundamentals of Databases
IN0009 Basic Principles: Operating Systems and System Software
IN0010 Introduction to Computer Networking and Distributed Systems

Courses, Learning and Teaching Methods and Literature

Courses and Schedule

TypeSWSTitleLecturer(s)DatesLinks
VI 5 Cloud-Based Data Processing (IN2386) Giceva Makreshanska, J. Gruber, F. Wed, 10:00–13:00, MI 02.09.014

Learning and Teaching Methods

Lectures, tutorials, problems for individual study:
The module consists of lectures and accompanying tutorials. The contents of the course will be primarily presented in the form of lectures and discussions of real-world system designs. Solutions to exercises will be discussed in the tutorials.

Media

Lecture with animated slides

Literature

• Designing Data-Intensive Applications by Martin Kleppmann
• Distributed Systems by Maarten van Steen, Andrew S. Tanenbaum
• Principles of Distributed Database Systems by M. Tamer Ozsu, Patrick Valduriez

Module Exam

Description of exams and course work

The exam takes the form of a 90 minutes written test or an oral exam in case of low number of participants. Assignments that check whether the student can identify the key requirements for a data processing in the cloud and design a scalable system that can meet them for a given scenario.

Exam Repetition

The exam may be repeated at the end of the semester.

Top of page