Cloud-Based Data Processing
Module IN2386
This module handbook serves to describe contents, learning outcome, methods and examination type as well as linking to current dates for courses and module examination in the respective sections.
Basic Information
IN2386 is a semester module in English language at Bachelor’s level and Master’s level which is offered in winter semester.
This Module is included in the following catalogues within the study programs in physics.
- Focus Area Bio-Sensors in M.Sc. Biomedical Engineering and Medical Physics
- Focus Area Imaging in M.Sc. Biomedical Engineering and Medical Physics
Total workload | Contact hours | Credits (ECTS) |
---|---|---|
150 h | 60 h | 5 CP |
Content, Learning Outcome and Preconditions
Content
Introduction
* Intro to Data-centers
* Latest trends in Cloud Computing
Fundamentals for managing distributed data
* Data replication and quorums
* Data partitioning
* Fault-tolerance and unreliable components
* Distributed system model
* Consensus protocols and coordination services
* Consistency models and distributed key value stores
Design of cloud-based data processing systems
* Distributed transactions
* Cloud-native OLTP databases
* Cloud-native data warehouses
* Dataflow computing (i.e., derived data)
* Cloud-scale data streaming systems
* Query-as-a-Service (QaaS) (serverless data processing)
* Resource management and scheduling
Other related topics:
* Novel data storage formats (e.g., data lakes, data-mash, etc.)
* Security and Privacy for data processing in the cloud
* Accelerators and impact of new hardware technology
* Intro to Data-centers
* Latest trends in Cloud Computing
Fundamentals for managing distributed data
* Data replication and quorums
* Data partitioning
* Fault-tolerance and unreliable components
* Distributed system model
* Consensus protocols and coordination services
* Consistency models and distributed key value stores
Design of cloud-based data processing systems
* Distributed transactions
* Cloud-native OLTP databases
* Cloud-native data warehouses
* Dataflow computing (i.e., derived data)
* Cloud-scale data streaming systems
* Query-as-a-Service (QaaS) (serverless data processing)
* Resource management and scheduling
Other related topics:
* Novel data storage formats (e.g., data lakes, data-mash, etc.)
* Security and Privacy for data processing in the cloud
* Accelerators and impact of new hardware technology
Learning Outcome
Upon successful completion of this module, students are able to:
* define the requirements and challenges when architecting, building and managing a large-scale data processing service in the cloud.
* use the theoretical foundations of distributed algorithms to construct the building blocks for a scalable data system design in relation to distributed storage, coordination and computation.
* understand and analyze the different trade-offs when designing scalable data processing systems that need to run in the public cloud.
* design, implement and evaluate a three-tier system using current cloud technologies.
* identify the scalability bottlenecks and vulnerabilities of a complex computer system.
* define the requirements and challenges when architecting, building and managing a large-scale data processing service in the cloud.
* use the theoretical foundations of distributed algorithms to construct the building blocks for a scalable data system design in relation to distributed storage, coordination and computation.
* understand and analyze the different trade-offs when designing scalable data processing systems that need to run in the public cloud.
* design, implement and evaluate a three-tier system using current cloud technologies.
* identify the scalability bottlenecks and vulnerabilities of a complex computer system.
Preconditions
IN0008 Fundamentals of Databases
IN0009 Basic Principles: Operating Systems and System Software
IN0010 Introduction to Computer Networking and Distributed Systems
IN0009 Basic Principles: Operating Systems and System Software
IN0010 Introduction to Computer Networking and Distributed Systems
Courses, Learning and Teaching Methods and Literature
Courses and Schedule
Type | SWS | Title | Lecturer(s) | Dates | Links |
---|---|---|---|---|---|
VI | 5 | Cloud-Based Data Processing (IN2386) | Fent, P. Georgoulakis Misegiannis, M. Giceva Makreshanska, J. |
Thu, 14:00–17:00, GALILEO 300 and singular or moved dates |
eLearning documents |
Learning and Teaching Methods
Lectures, tutorials, problems for individual study:
The module consists of lectures and accompanying tutorials. The contents of the course will be primarily presented in the form of lectures and discussions of real-world system designs. Solutions to exercises will be discussed in the tutorials.
The module consists of lectures and accompanying tutorials. The contents of the course will be primarily presented in the form of lectures and discussions of real-world system designs. Solutions to exercises will be discussed in the tutorials.
Media
Lecture with animated slides
Literature
• Designing Data-Intensive Applications by Martin Kleppmann
• Distributed Systems by Maarten van Steen, Andrew S. Tanenbaum
• Principles of Distributed Database Systems by M. Tamer Ozsu, Patrick Valduriez
• Distributed Systems by Maarten van Steen, Andrew S. Tanenbaum
• Principles of Distributed Database Systems by M. Tamer Ozsu, Patrick Valduriez
Module Exam
Description of exams and course work
The exam takes the form of a 90 minutes written test or an oral exam in case of low number of participants. Assignments that check whether the student can identify the key requirements for a data processing in the cloud and design a scalable system that can meet them for a given scenario.
Exam Repetition
The exam may be repeated at the end of the semester.