Osion kuvaus

  • We will have 9 lectures, each is for 2 hours.

    You should also  do 18 hours for self-study. Overall, 36 hours should be spent for learning concepts. Further learning time will be spent in tutorials and the work in assignments.

    • Basic information about the course will be given:

      • Important notes about grading: the course evaluation will be based on assignments (include design, implementation and discussion)
      • Strict deadline
      • Communications: in mycourses, slack and other means
      We will have a Q&A in the first lecture date: 13.01.2021 (pls. download and read the slides in advance)

    • In this lecture we will discuss what a big data platform is about. We will study key motivations for us to learn topics of big data platforms.

      The lecture will be done on 13.01.2020.

    • We study and discuss key architectural principles for designing big data platforms.  The lecture will be on 20.01.2020

      • your scenario/story of big data
      • data movement in big data platforms
      • basic big data pipelines
        • Lambda architecture
        • Kappa architecture
      • big data at large-scale
        • key building blocks and technologies
        • reactive systems for big data platforms
        • partitioning
        • data concerns
        • component API, interaction, orchestration and coordination
        • components distribution
        • scalability and elasticity
    • Cloud technologies are important for developing and operating big data platforms. We will discuss the roles of cloud infrastructures for big data platforms.

      • How would cloud technologies affect big data platform designs
        • service models and virtualization
        • examples: Kubernetes, VM, containers, ...
      • Cloud technologies empowering big data platforms
        • manage infrastructural resources for big data platforms
        • fault-tolerance, performance and elasticity
        • microservices and devops
    • We examine service models and integration for big data platforms. The lecture will be on 27.01.2020

      • Bring data into platforms
        • data transfer/uploading models
        •  examples of technology stacks (Google, AWS, Azure)
      • Messaging protocols for big data
        • MQTT
        • AMQP
      • Optimizing service requests and functionalities
        • Contention, back-pressure, elasticity
        • Sharding
      • Discovery and consensus in big data platforms
        • Key techniques
        • Examples of Zookeeper, consul, etcd.
    • Big data storages, databases and services in big data platforms. The lecture will be on 03.02.2020

      • Consistency, Availability and Partition Tolerance
        • Basic models, CAP/BASE
      • Data models and data management
        • Data models (File, relational data, Key-value model, document-oriented model, column family, graph)
        • Examples with Cloud storage, Cassandra, Mongodb, etc.
    • Big data ingestion techniques. The lecture will be on 10.02.2020

      • Big data ingestion
        • Models
        • Data formats/semantics
        • Patterns for data ingestion
      • Ingestion processes: architectures and tools
        • Common
        • Batch models
        • Function-as-a-service models
        • Microbatching
      • Examples
        • E.g., Logstash, using message brokers, Apache Nifi
    • We will discuss about  Hadoop and its key components for big data ecosystem. The lecture will be on 03.03.2020

      • Distributed big data in clusters
      • Hadoop File systems
      • YARN
      • Hadoop-native big database/data warehouse systems
        • HBase
        • Apache Hive
      • Use Hadoop for complex data management and analytics
    • MapReduce and Spark programming models for big data processing. The lecture will be on 10.03.2020

      • MapReduce programming model
      • Apache Spark
      • Real-world examples

    • Stream processing for big data and its relation to big data platforms. The lecture will be on 24.03.2020


      • Stream processing and big data platforms
      • Key concepts of stream processing
        • Event models, processing functions, windows, consistency
        • Parallelism in stream processing
      • Apache Flink
    • Workflow technologies and frameworks for big data. The lecture will be on 31.03.2020

      • The role of workflows for big data processing and platforms management
      • Workflow models
        • Common concepts, workflows of batch tasks, workflows of function-as-a-service
      • Apache Airflow