Topic: Lectures | CS-E4640 - Big Data Platforms D, Lecture, 11.1.2023-13.4.2023

Select activity Lecture 1 - Introduction to Big Data Platforms

Lecture 1 - Introduction to Big Data Platforms Page

In this lecture we will discuss what a big data platform is about. We will study key motivations for us to learn topics of big data platforms.
Select activity Lecture 1 - Architecting Big Data Platforms
Lecture 1 - Architecting Big Data Platforms Page
We study and discuss key architectural principles for designing big data platforms.
your scenario/story of big data
data movement in big data platforms
basic big data pipelines
Lambda architecture
Kappa architecture
big data at large-scale
key building blocks and technologies
reactive systems for big data platforms
partitioning
data concerns
component API, interaction, orchestration and coordination
components distribution
scalability and elasticity
Select activity Lecture 2 - Service and Integration Models in Big Data Platforms
Lecture 2 - Service and Integration Models in Big Data Platforms Page
We examine service models and integration for big data platforms.
Bring data into platforms
data transfer/uploading models
examples of technology stacks (Google, AWS, Azure)
Messaging protocols for big data
MQTT
AMQP
Optimizing service requests and functionalities
Contention, back-pressure, elasticity
Sharding
Discovery and consensus in big data platforms
Key techniques
Examples of Zookeeper, consul, etcd.
Select activity Lecture 3 - Big Data Storage and Database Services
Lecture 3 - Big Data Storage and Database Services Page
Big data storages, databases and services in big data platforms.
Consistency, Availability and Partition Tolerance
Basic models, CAP/BASE
Data models and data management
Data models (File, relational data, Key-value model, document-oriented model, column family, graph)
Examples with Cloud storage, Cassandra, Mongodb, etc.
Select activity Lecture 4 - Big Data Ingestion
Lecture 4 - Big Data Ingestion Page
Big data ingestion techniques.
Big data ingestion
Models
Data formats/semantics
Patterns for data ingestion
Ingestion processes: architectures and tools
Common
Batch models
Function-as-a-service models
Microbatching
Examples
E.g., Logstash, using message brokers, Apache Nifi
Select activity Lecture 5 - Hadoop and its Big Data Ecosystem
Lecture 5 - Hadoop and its Big Data Ecosystem Page
We will discuss about Hadoop and its key components for big data ecosystem.
Distributed big data in clusters
Hadoop File systems
YARN
Hadoop-native big database/data warehouse systems
HBase
Apache Hive
Use Hadoop for complex data management and analytics
Select activity Lecture 6 - Big Data Processing with MapReduce/Spark Programming Models
Lecture 6 - Big Data Processing with MapReduce/Spark Programming Models Page
MapReduce and Spark programming models for big data processing.
MapReduce programming model
Apache Spark
Real-world examples
Select activity Lecture 7 - Stream Processing and Big Data Platforms
Lecture 7 - Stream Processing and Big Data Platforms Page
Stream processing for big data and its relation to big data platforms.
Stream processing and big data platforms
Key concepts of stream processing
Event models, processing functions, windows, consistency
Parallelism in stream processing
Apache Flink
Select activity Lecture 8 - Workflows for Big Data Platforms
Lecture 8 - Workflows for Big Data Platforms Page
Workflow technologies and frameworks for big data.
The role of workflows for big data processing and platforms management
Workflow models
Common concepts, workflows of batch tasks, workflows of function-as-a-service
Apache Airflow
Select activity Lecture 9 - New Trends in Big Data Platforms

Lecture 9 - New Trends in Big Data Platforms Page

CS-E4640 - Big Data Platforms D, Lecture, 11.1.2023-13.4.2023

Topic outline

Lectures

Students

Teachers

About service