Overview

We will study basic MapReduce and Spark programming models for big data processing

  • Also check again the lecture Hadoop and its big data ecosystems.

Reading List

  •  Matei Zaharia, Bill Chambers , Spark: The Definitive Guide, [Book](https://learning.oreilly.com/library/view/spark-the-definitive/9781491912201/), [Code](https://github.com/databricks/Spark-The-Definitive-Guide)
  • Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (October 2016), 56-65. DOI: https://doi.org/10.1145/2934664
  • Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (October 2016), 56-65. DOI: https://doi.org/10.1145/2934664
  • Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: simplified data processing on large clusters. Commun. ACM 51, 1 (January 2008), 107-113. DOI: https://doi.org/10.1145/1327452.1327492
  • Jeffrey Dean and Sanjay Ghemawat. 2004. MapReduce: simplified data processing on large clusters. In Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation - Volume 6 (OSDI'04), Vol. 6. USENIX Association, Berkeley, CA, USA, 10-10.
  • Tom White, Hadoop: The Definitive Guide, 4th Edition, O'Reilly Media, Inc. [Link](https://learning.oreilly.com/library/view/hadoop-the-definitive/9781491901687/)

Last modified: Tuesday, 9 March 2021, 6:49 PM