Enrolment options

Please note! Course description is confirmed for two academic years, which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.

LEARNING OUTCOMES

After completing the course, students will be able to

  • Acquire hands-on experience on working large-scale data-intensive computing environments, including software stacks, data management and hardware in data centers.  
  • Understand how to translate the real-world problem into requirements for data-intensive applications and solve them by leveraging distributed systems/big data systems/machine learning systems on top high-performance computing (HPC) clusters and supercomputers
  • Apply studied frameworks and techniques to build real-world data intensive solutions and to systematically evaluate these solutions. 

Credits: 5

Schedule: 10.01.2025 - 04.04.2025

Teacher in charge (valid for whole curriculum period):

Teacher in charge (applies in this implementation): Maarit Korpi-Lagg, Linh Truong, Zhao

Contact information for the course (applies in this implementation):

CEFR level (valid for whole curriculum period):

Language of instruction and studies (applies in this implementation):

Teaching language: English. Languages of study attainment: English

CONTENT, ASSESSMENT AND WORKLOAD

Content
  • valid for whole curriculum period:

    This research project’s learning goal is to familiarize the students with data-intensive computing software stacks (compilers, databases, big data frameworks, modern machine learning frameworks) and optimization techniques to efficiently manage data-center resources (GPUs, CPUs, InfiniBand connection and storage) to solve real-world problems (e.g., training large language models or computing scientific simulation across a large number of GPUs/CPUs). During this course, the students will work on existing tools/frameworks to solve challenging problems over real-world data sets. In addition, the students will learn to collaborate with team members and learn from other tools/frameworks/applications through demonstrations by peers.

    For the research activities, the course will introduce a preliminary list of topics. Students are also encouraged to propose their own topics. Supercomputers, HPC clusters, and distributed computing and storage environments will be set up for the course participants (mainly hosted at CSC-IT Center for Science). Depending on the selected topics, students can either work individually or in teams.

    The course will provide introductory lecture(s) on how to use environments. The course will run weekly meetings with the students/teams to discuss materials and instructions on the project (introduction to the tool, documentations, research papers, demonstration of the usage). After introductory session(s), the weekly meetings will serve as project update sessions. At the end of the course, the students (teams) are expected to give demonstrations to other student (teams) on their project.



Assessment Methods and Criteria
  • valid for whole curriculum period:

    To pass the course: the student has to pass three components:

    • Min 80% participation in the weekly sessions
    • Programming/development activities with real-cases/machines  
    • Successful, accepted final presentation and demonstration to other students, participation in others' final presentations

Workload
  • valid for whole curriculum period:

    Weekly meeting (12*2=24), self-study of the materials (15), project development (72), preparing and presenting the final work (12), participation in other students’ presentations (22).



DETAILS

Study Material
  • valid for whole curriculum period:

    • No text book
    • GIT repository of  research articles, tutorials
    • Study materials are given/updated based on topics of the research

Substitutes for Courses
Prerequisites

FURTHER INFORMATION

Further Information
  • valid for whole curriculum period:

    Teaching Language: English

    Teaching Period: 2024-2025 Spring III - IV
    2025-2026 Spring III - IV

    Registration:

    CCIS Master Programme, Doctoral Studies in Science and Engineering.

    Max. 12 students can be admitted.

Guests cannot access this workspace. Please log in.