Please note! Course description is confirmed for two academic years, which means that in general, e.g. Learning outcomes, assessment methods and key content stays unchanged. However, via course syllabus, it is possible to specify or change the course execution in each realization of the course, such as how the contact sessions are organized, assessment methods weighted or materials used.

LEARNING OUTCOMES

After this course, the student is able to

  • understand big data and platforms w.r.t. services, stakeholders, interactions and state-of-the-art technologies
  • understand key interactions and performance design patterns in big data platforms
  • produce designs of big data platforms with key services like data stores, data ingestion, batch and stream processing
  • demonstrate design and implementation of big data ingestion, batch processing, streaming processing and data governance processes.
  • assess performance and reliability issues in operating big data platforms
  • deliver real-world prototypes of big data platforms with real datasets and technologies in a large-scale systems.

Credits: 5

Schedule: 13.01.2021 - 07.04.2021

Teacher in charge (valid 01.08.2020-31.07.2022): Linh Truong

Teacher in charge (applies in this implementation): Linh Truong

Contact information for the course (valid 09.12.2020-21.12.2112):

Due to COVID-19, the course will be online. Students can contact the professor in charge and TAs through:

  • Using Microsoft Teams chat/message directly
  • Using Microsoft Teams chat/message in the course space
  • Using emails

CEFR level (applies in this implementation):

Language of instruction and studies (valid 01.08.2020-31.07.2022):

Teaching language: English

Languages of study attainment: English

CONTENT, ASSESSMENT AND WORKLOAD

Content
  • Valid 01.08.2020-31.07.2022:

    The course will provide  knowledge covering main aspects of big data platforms, including data platform services and ecosystems,  architectures and designs for big data, core services in big data stores, big data ingestion techniques, big data processing models (batch and streaming), and big data governance. Common aspects like users, developers and providers interactions, reliability, performance and elasticity for big data plaforms will be studied and implemented. Both design, development and operations of big data platforms are covered. 

  • Applies in this implementation:

    Lectures:

    • Introduction to Big Data Platforms
    • Architecting Big Data Platforms
    • Service and Integration Models in Big Data Platforms
    • Big Data Storage and Database Services
    • Big Data Ingestion
    • Hadoop and Its Big Data Ecosystems
    • Big Data Processing with Mapreduce/Spark Programming Models
    • Streaming Processing and Big Data Platforms
    •  Workflows for Big Data Platforms

    Tutorials:

    • Some industrial and open source big data platforms for Your tech radar
    • Hands-on examples with big database services
    • Data Ingestion with Apache Nifi
    • Hadoop
    • Data Processing with Apache Spark
    • Stream Processing with Apache Flink
    • Data processing with Apache Airflow

    Meetups:

    • A Taste of Big Data Platforms
    • How to succeed on assignments in Big Data Platforms
    • Issues in time series data ingestion
    • Big Data Platforms and Microservices





Assessment Methods and Criteria
  • Valid 01.08.2020-31.07.2022:

    Assigments and exams (based on Q/A for assignments). Each assignment will include theoretical concepts, big datasets, component designs, software implementation and testing, and extensibility/integration discussions. 

  • Applies in this implementation:

    Three assignments will be given.


Workload
  • Valid 01.08.2020-31.07.2022:

    Lectures: 10 (2), Teaching in small groups: 7 (1), Independent work, including self-study and assignments: 88

    Note the workload ratios:

    MethodTeaching hoursIndepdent workTotal workload
    Lecture202040
    Exercise707
    Asssignments 8888
    Total  135

     

  • Applies in this implementation:

    Lecture:

    • Teaching hours: 18, Independent work: 18, Total workload: 40

    Exercise (hands-on and meetups):

    • Teaching hours:7
    • Meetups: 4

    Assignments:

    • 88 hours

DETAILS

Study Material
  • Valid 01.08.2020-31.07.2022:

    Lecture slides, tutorials, open sources, and  assignments

  • Applies in this implementation:

Prerequisites
  • Valid 01.08.2020-31.07.2022:

    This course requires background and knowledge about cloud computing, distributed computing, operating systems, and basic databases.   For students to fulfil such background and knowledge, students must either (1) finish the follow courses in Aalto: CS-C3140 Operating Systems and CS-E4150 Cloud Software and Systems, or (2) demonstrate that students understand relevant concepts and technologies like  distributed computing infrastructures, service discovery, virtualization and containers, distributed filesystems and  databases. Furthermore, students must be able to program well with one or two common programming languages: Java, JavaScript, GoLang, Python, and Scala.

    Prerequisites will be checked through students's completed courses and/or through pre-assignment/interview with responsible teachers.

    It is an advantage for the study if students have finished courses covering topics in Parallel Computing, NoSQL databases, Service Design as well as to be able to work with more than the above-mentioned programming languages as well as to be familar with working in large-scale computing enviromments.

FURTHER INFORMATION

Details on the schedule