CS-E4640 - Big Data Platforms D, 13.01.2021-07.04.2021
This course space end date is set to 07.04.2021 Search Courses: CS-E4640
Topic outline
-
The lectures are only for concepts, designs and possible examples of technologies. Therefore, we have some hands-on tutorials and discussions for some practical systems/choices. We use tutorial sessions to run examples and discuss related to real-world implementations. Each tutorial will be short and aim at supporting students to deal with real systems.
In total we will have 7 hours for tutorials.
Click here to see tutorial videos.
Note that the detailed content of the tutorial will be updated.
-
Walk around of key industrial and open source big data platforms that are important for industrial and real-world applications that you can use for your study (e.g., from Google, Microsoft, Amazon and Apache open sources)
-
Hands-on tutorials on understanding performance and consistency by using Cassandra as one example. You will practice with a production-level deployment of Cassandra and a real-world dataset.
-
Data Ingestion with Apache Nifi Page
Hands-on tutorial with Apache Nifi for moving data among different services. You will practice with also RabbitMQ and cloud storage.
-
Hands-on with Hadoop Page
You will practice hands-on activities with a Hadoop system, focusing on basic Hadoop Filesystems and Hive.
-
In this tutorial, you will practice to write Spark code and test it in a production-level spark cluster, using real data set, e.g. New York Taxi data.
-
This tutorial is for setting up Apache Flink and developing stream data processing applications using Flink.
-
In this tutorial, you will practice with Apache Airflow and develop examples for data processing using workflows.
-