CS-E4640 - Big Data Platforms D, 13.01.2021-07.04.2021
This course space end date is set to 07.04.2021 Search Courses: CS-E4640
Topic outline
-
COVID-19 Notes
Due to the COVID-19 situations, the current decision for Spring is to run the course online. Note that this is not MOOC and the online mode means that lectures and hands-on will be done through online sessions with no pre-recorded and live videos.Welcome to the CS-E4640!
The course is under lead of Hong-Linh Truong. The course will provide knowledge covering main aspects of big data platforms, including platform understanding and design, core services in big data storage, big data ingestion, big data processing and non-functional aspects like reliability, data governance and quality management. Both development and operations of big data platform are covered. Furthermore, services in big data platform ecosystems will be discussed.
To join this course, we expect the student to know basic cloud computing systems, database/data management, service design and DevOps in cloud computing. It is an advantage for the study of this Big Data Platforms course, if the student has already completed, for example, courses like Mobile Cloud Computing, Software Architectures, and Concurrent Programming.
Academic Audit vs Assessment (for Certificates) Groups
Note that we have some prerequisites in the course description to make sure that students have enough background to learn the course. In this course, we have two specific groups:
- CourseAudit: a course participant should register this group, if the participant just wants to audit the course (no assignment submission for obtaining certificates)
- Learn4Certificate: a course participant should register this group, if the participant
wants to obtain the certificate/grade from the course (assignment submissions are allowed for obtaining
certificates). Being in this group, the course prerequisite might be checked.
Course Story
Imagine that you finish the course and become an "expert" of "Big Data Platforms" from @CSAalto. You work for a company and one day you get a request to build a big data platform for the company with your team (in this course your team is you, playing different roles). You might get a description like
“Your team has to build a big data platform for X types of data. Data will be generated/collected from N sources. We expect to have 10+ GBs/day of data to be ingested into our platform. We will have to serve K thousands of requests for different types of analytics – to be determined. Our response time should be in t milliseconds. Our services should not be …”
@PS: and things will be added and changed
And you know that big data is characterized many V properties (volume, velocity, variety, varacity, ...) and a platform must be able to facilitate different types of interactions for exchanging data and services, etc. You are faced with different questions related to the development and operation of big data platforms and their big data pipelines: how to design the big data platform which can be resilient, elastic and responsive that allow different customers and applications to be integrated? Which are the data models you have to select? Whether you have to support batch or streaming processing? etc. Also very practical issues like: should you use public cloud infrastructures or build your own. Which cloud companies should you rely? Google, Amazon or Microsoft?. Your story is not centered around a "narrow scope" of big data processing, like taking a lot of data, puting them into Hadoop and running ML algorithms (although it is not easy to achieve the work in such a "narrow scope") but you need to deal with a big picture of many tasks in big data platforms, involved in designs with microservices and serverless, reactive systems patterns, big data storage and database, complex data ingestions, various data processing models and algorithms atop them, to name just a few.
But of course, with a limited time in a 5 credit course, you cannot be the master of all aspects (BTW who could be the master of big data, given the complexitity of the field?). Thus you need to build your platform atop core concepts, practice your tasks with the four assignments, exploring the best skills you have in the big "Big Data Platforms" and let your other team members to work with you to deliver the "Big Data Platform" under your lead. Build your story!Course Schedule
All dates in the agenda are booked for Lectures and Tutorials
- See the current agenda.
- No lecture day:
Some important notes:
- Basic course management information
- We use the Announcements to inform you about important information.
- You can use the General discussion space (a forum in my course) and the Microsoft Teams for discussion.
- Check the FAQ to see if it answers some of your questions before posting your questions.