Topic outline

  • COVID-19 Notes

    Due to the COVID-19 situations, the current decision for Spring is to run the course online. Note that this is not MOOC and the online mode means that lectures and hands-on will be done through online sessions with no pre-recorded and live videos.

    Welcome to the CS-E4640!

    The course is under lead of Hong-Linh Truong. The course will provide  knowledge covering main aspects of big data platforms, including platform understanding and design, core services in big data storage, big data ingestion, big data processing and non-functional aspects like reliability, data governance and quality management. Both development and operations of big data platform are covered. Furthermore, services in big data platform ecosystems will be discussed.

    To join this course, we expect the student to know basic cloud computing systems, database/data management, service design and  DevOps in cloud computing. It is an advantage for the study of this Big Data Platforms course, if the student has already completed, for example, courses like Mobile Cloud Computing, Software Architectures, and Concurrent Programming.

    Academic Audit vs Assessment (for Certificates) Groups

    Note that we have some prerequisites in the course description to make sure that students have enough background to learn the course. In this course, we have two specific groups:

    • CourseAudit: a course participant should register this group, if the participant just wants to audit the course (no assignment submission for obtaining certificates)
    • Learn4Certificate: a course participant should register this group, if the participant  wants to obtain the certificate/grade from the course (assignment submissions are allowed for obtaining certificates). Being in this group, the course prerequisite might be checked.

    Course Story

    Imagine that you finish the course and become an "expert" of "Big Data Platforms" from @CSAalto. You work for a company and one day you get a request  to build a big data platform for the company with your team (in this course your team is you, playing different roles). You might get a description like

    “Your team has to build a big data platform for X types of data. Data will be generated/collected from N sources. We expect to have  10+ GBs/day  of data to be ingested into our platform. We will have to serve K thousands of requests for  different types of analytics – to be determined. Our response time should be  in t milliseconds. Our services should not be …”

    @PS: and things will be added and changed

    And you know that big data is characterized many V properties (volume, velocity, variety, varacity, ...) and a platform must be able to facilitate different types of interactions for exchanging data and services, etc. You are faced with different questions related to the development and operation of big data platforms and their big data pipelines: how to design the big data platform which can be resilient, elastic and responsive that allow different customers and applications to be integrated? Which are the data models you have to select? Whether you have to support batch or streaming processing? etc. Also very practical issues like: should you use public cloud infrastructures or build your own. Which cloud companies should you rely? Google, Amazon or Microsoft?. Your story is not centered around a "narrow scope" of big data processing, like taking a lot of data, puting them into Hadoop and running ML algorithms (although it is not easy to achieve the work in such a "narrow scope") but you need to deal with a big picture of many tasks in big data platforms, involved in designs with microservices and serverless, reactive systems patterns, big data storage and database, complex data ingestions, various data processing models and algorithms atop them, to name just a few.

    But of course, with a limited time in a 5 credit course, you cannot be the master of all aspects (BTW who could be the master of big data, given the complexitity of the field?). Thus you need to build your platform atop core concepts, practice your tasks with the four assignments, exploring the best skills you have in the big "Big Data Platforms" and let your other team members to work with you to deliver the "Big Data Platform" under your lead. Build your story!

    Build your big data platform story


    Course Schedule

    All dates in the agenda are booked for Lectures and Tutorials

      Some important notes: