Goal of this lab - Supercomputing for Big Data - Lab Manual

Goal of this Lab

The goal of this lab is to achieve the Course Learning Objectives, that we repeat here.

By the end of this course, you will be able to:

ID	Description
L1	Use basic big data processing systems like Hadoop and MapReduce.
L2	Implement parallel algorithms using the in-memory Spark framework, and streaming using Kafka.
L3	Use libraries to simplify implementing more complex algorithms.
L4	Identify the relevant characteristics of a given computational platform to solve big data problems.
L5	Utilize knowledge of hardware and software tools to produce an efficient implementation of the application.

We will achieve these learning objectives by performing the following tasks, spread out over three labs.

You will work with Apache Spark, the MapReduce programming paradigm, and the Scala programming language, which is widely used in this domain.
You will approach a big data problem analytically and practically.
You will work with cloud-based systems.
You will deal with existing infrastructures for big data.
You will modify an existing application to operate in a streaming data context.