Welcome to the CS-E4640!
This year the course will be under lead of Hong-Linh Truong. The course will provide knowledge covering main aspects of big data platforms, including platform understanding and design, core services in big data storage, big data ingestion, big data processing and non-functional aspects like reliability, data governance and quality management. Both development and operations of big data platform are covered. Furthermore, services in big data platform ecosystems will be discussed. Therefore, compared with the previous editions, (new) contents/topics will be updated/added in this year.
To join this course, we expect the student to know basic cloud computing systems, database/data management, service design and DevOps in cloud computing. It is an advantage for the study of this Big Data Platforms course, if the student has already completed, for example, courses like Mobile Cloud Computing, Software Architectures, and Concurrent Programming.
Imagine that you finish the course and become an "expert" of "Big Data Platforms" from @CSAalto. You work for a company and one day you get a request to build a big data platform for the company with your team (in this course your team is you, playing different roles). You might get a description like
“Your team has to build a big data platform for X types of data. Data will be generated/collected from N sources. We expect to have 10+ GBs/day of data to be ingested into our platform. We will have to serve K thousands of requests for different types of analytics – to be determined. Our response time should be in t milliseconds. Our services should not be …”
@PS: and things will be added and changed
And you know that big data is characterized many V properties (volume, velocity, variety, varacity, ...) and a platform must be able to facilitate different types of interactions for exchanging data and services, etc. You are faced with different questions related to the development and operation of big data platforms and their big data pipelines: how to design the big data platform which can be resilient, elastic and responsive that allow different customers and applications to be integrated? Which are the data models you have to select? Whether you have to support batch or streaming processing? etc. Also very practical issues like: should you use public cloud infrastructures or build your own. Which cloud companies should you rely? Google, Amazon or Microsoft?. Your story is not centered around a "narrow scope" of big data processing, like taking a lot of data, puting them into Hadoop and running ML algorithms (although it is not easy to achieve the work in such a "narrow scope") but you need to deal with a big picture of many tasks in big data platforms, involved in designs with microservices and serverless, reactive systems patterns, big data storage and database, complex data ingestions, various data processing models and algorithms atop them, to name just a few.
But of course, with a limited time in a 5 credit course, you cannot be the master of all aspects (BTW who could be the master of big data, given the complexitity of the field?). Thus you need to build your platform atop core concepts, practice your tasks with the four assignments, exploring the best skills you have in the big "Big Data Platforms" and let your other team members to work with you to deliver the "Big Data Platform" under your lead. Build your story!
- No lecture in the weeks 21-25 Oct and 9-13 Dec (Evaluation week)
- No lecture in the following specific days: (will be updated)
- No tutorial in the following specific days: 26.09, (will be updated)
Some important notes:
- We use the Announcements to inform you about important information.
- You can use the General discussion space (a forum in my course) and the Big Data Platforms Slack for discussion. Note that the lecturer and TA wont be able to respond/comment messages in the forum and the slack workspace in real-time.
- Check the FAQ to see if it answers some of your questions before posting your questions.