Goal of this Lab

The goal of this lab is to achieve the Course Learning Objectives, that we repeat here.

By the end of this course, you will be able to:

IDDescription
L1Use basic big data processing systems like Hadoop and MapReduce.
L2Implement parallel algorithms using the in-memory Spark framework, and streaming using Kafka.
L3Use libraries to simplify implementing more complex algorithms.
L4Identify the relevant characteristics of a given computational platform to solve big data problems.
L5Utilize knowledge of hardware and software tools to produce an efficient implementation of the application.

We will achieve these learning objectives by performing the following tasks, spread out over three labs.

  • You will work with Apache Spark, the MapReduce programming paradigm, and the Scala programming language, which is widely used in this domain.
  • You will approach a big data problem analytically and practically.
  • You will work with cloud-based systems.
  • You will deal with existing infrastructures for big data.
  • You will modify an existing application to operate in a streaming data context.