Please note! Course description is confirmed for two academic years, which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.


After this course, you will know how to write computationally intensive C or C++ code that makes an efficient use of dozens of CPU cores. You will learn how to partition large-scale computations between multiple processor cores, and how to choose the best memory layout for your data structures. You will also get hands-on experience of offloading computations from CPUs to GPUs. You will learn new kinds of algorithm design techniques that are relevant in the context of parallel computers, and you will also learn which of these techniques actually work in practice on modern multicore CPUs and GPUs.

Credits: 5

Schedule: 19.04.2021 - 28.05.2021

Teacher in charge (valid 01.08.2020-31.07.2022): Jukka Suomela

Teacher in charge (applies in this implementation): Jukka Suomela

Contact information for the course (valid 29.03.2021-21.12.2112):

Our primary communication channel is Zulip — if you have any questions, please try to ask our course staff there! If you have difficulties joining Slack or for some other reason it does not work for you, please email Jukka Suomela.

CEFR level (applies in this implementation):

Language of instruction and studies (valid 01.08.2020-31.07.2022):

Teaching language: English

Languages of study attainment: English


  • Valid 01.08.2020-31.07.2022:

    This is a practical hands-on course on algorithm engineering for modern parallel computers. The students will learn how to design programs that make the best possible use of the computing power of multicore CPUs and GPUs. The course projects will cover both numerical and combinatorial problems; the sole objective is to solve the task at hand in the shortest possible time. We will learn a whole range of techniques for speeding up computations, from bit manipulation hacks and special CPU instructions to high-level techniques such as choosing the right memory layout that makes the best possible use of the cache hierarchy.

  • Applies in this implementation:

    Students will learn:

    • How to do multicore CPU programming (multithreading, OpenMP)
    • How to exploit instruction-level parallelism
    • How to use vector instructions (SIMD, AVX)
    • How to program GPUs (CUDA)
    • How to benefit from data reuse in registers (CPU and GPU), caches (CPU), and shared memory (GPU)
    • How to choose the right memory access pattern for CPU and GPU code
    • How to benchmark and identify performance bottlenecks

    We will also discuss some more advanced material:

    • How to read assembly code produced by the compiler
    • How to use hardware and software prefetching

    We will use Linux environment, GCC, and CUDA.

Assessment Methods and Criteria
  • Valid 01.08.2020-31.07.2022:

    Programming exercises.

  • Applies in this implementation:

    Solve programming exercises, correctly and efficiently, and return your solutions on time via GitHub. There are both “recommended exercises” and “challenging exercises”. If you solve all recommended exercises correctly and sufficiently efficiently, you can get up to 77 points. The grade thresholds are:

    • 38 points: grade 1/5
    • 45 points: grade 2/5
    • 51 points: grade 3/5
    • 58 points: grade 4/5
    • 64 points: grade 5/5

    For the full list of exercises, see

  • Applies in this implementation:

    5 credits / 6 weeks ≈ 22 hours / week:

    • lecture: 2 hours/week
    • exercise sessions: 0–4 hours/week
    • solving exercises and self-study: 16–20 hours/week


Study Material
  • Valid 01.08.2020-31.07.2022:

    Available online.

  • Applies in this implementation:

    Available at

  • Valid 01.08.2020-31.07.2022:

    No prior knowledge of parallel programming is needed. Students should have a good understanding of computer programming, algorithms and data structures, and a working knowledge of either C or C++ programming language. While this course is primarily targeted to Master students, advanced Bachelor students are welcome to join if they have sufficient background knowledge and programming skills. At the minimum, students should have completed all 1st year and 2nd year courses of their Bachelor degree.

SDG: Sustainable Development Goals

    9 Industry, Innovation and Infrastructure


Details on the schedule
  • Applies in this implementation:

    The important deadlines are:

    • Exercises: every Sunday at 23:59.
    • Prerequisite test: Friday, 23 April 2021, at 23:59.