Please note! Course description is confirmed for two academic years, which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.

LEARNING OUTCOMES

After completing the course, students will be able to

  • Define the requirements, building blocks, and challenges when architecting, building and managing a large-scale data-center infrastructure for distributed systems and databases.
  • Know the design principles and implement code for engineering scalable data-intensive systems and their applications; analyze the full system stack for managing and scheduling data-center resources in relation to distributed storage, coordination and computation.
  • Critically assess the trade-offs between different requirements when designing scalable distributed systems; Understand the trade-offs in converting between data models and database tools.
  • Understand new data models and new storage technologies, as well as their impacts on query execution, database systems, cloud platforms, data processing pipeline, and modern machine learning systems.        
  • Discuss, compare and criticize the state-of-the-art research approaches presented in research papers targeting distributed systems, databases and machine learning systems.

Credits: 5

Schedule: 06.09.2024 - 29.11.2024

Teacher in charge (valid for whole curriculum period):

Teacher in charge (applies in this implementation): Zhao

Contact information for the course (applies in this implementation):

CEFR level (valid for whole curriculum period):

Language of instruction and studies (applies in this implementation):

Teaching language: English. Languages of study attainment: English

CONTENT, ASSESSMENT AND WORKLOAD

Content
  • valid for whole curriculum period:

    This course is on the design and implementation of scalable data management systems. Topics include data-center technologies, data models (relational, document, key/value), storage models, query languages, storage architectures, indexing, query processing and optimization, in-memory databases, distributed storage, distributed coordination (consensus protocols and use-cases), transaction processing and concurrency control, new storage media, and parallel architectures (multicores/multi-socket/chiplet), as well as studies on open-source/commercial distributed database systems to illustrate these techniques and trade-offs.

Assessment Methods and Criteria
  • valid for whole curriculum period:

    Exercises and assignments.

DETAILS

Study Material
  • valid for whole curriculum period:

    Lecture slides, tutorials, open-source software, scientific papers, and assignments

Substitutes for Courses
Prerequisites
SDG: Sustainable Development Goals

    4 Quality Education

    5 Gender Equality

    8 Decent Work and Economic Growth

    9 Industry, Innovation and Infrastructure

    11 Sustainable Cities and Communities

    13 Climate Action

FURTHER INFORMATION

Further Information
  • valid for whole curriculum period:

    Teaching Language: English

    Teaching Period: 2024-2025 Autumn I - II
    2025-2026 Autumn I - II

    Registration:

    Participation is subject to a maximum quota of 50, enrollments will be prioritized according to the following criteria: master students for which the course is mandatory; students with strong systems programming skills (e.g., C/C++/Rust); the rest of the students fulfilling pre-requisites.