Docker
According to the Docker Documentation
Docker is a platform for developers and sysadmins to develop, deploy, and run applications with containers. The use of Linux containers to deploy applications is called containerization. Containers are not new, but their use for easily deploying applications is. Containerization is increasingly popular because containers are:
Flexible
- Even the most complex applications can be containerized.
Lightweight
- Containers leverage and share the host kernel.
Interchangeable
- You can deploy updates and upgrades on-the-fly.
Portable
- You can build locally, deploy to the cloud, and run anywhere.
Scalable
- You can increase and automatically distribute container replicas.
Stackable
- You can stack services vertically and on-the-fly.
For this course, we use Docker primarily to ensure every student is using the exact same platform for their applications, and to avoid certain platform-specific issues and peculiarities.
You are not required to use Docker for this lab when you feel comfortable setting up the required tools on your own system.
A basic understanding of some Docker concepts helps in getting started with this course. Part 1: Orientation and setup of the Get Started Guide covers the basic Docker concepts used in this course.
Before trying the lab assignments and tutorials in the next sections, make sure you Install Docker (stable) and test your installation by running the simple Hello World image.
docker run hello-world
Setting up Spark in Docker
In order to run Spark in a container, a Dockerfile
is provided in the root of
all repositories we will use in this lab, including the repository for the
Getting Started guide. The Dockerfile
can be used to build images for
spark-submit
to run your Spark application, spark-shell
to run a Spark
interactive shell, and the Spark history server to view event logs from
application runs. You need to build these images before you get started. The
Dockerfiles we provide assume that you run Docker from the folder at which they
are located. Don't move them around! They will stop working.
To build a docker image from the Dockerfile, we use docker build
:
docker build --target <target> -t <tag> .
Here <target>
selects the target from the Dockerfile, <tag>
sets the tag
for the resulting image, and the .
sets the build context to the current
working directory.
We use docker
to pull and build the images we need to use Spark and SBT.
-
sbt
docker pull hseeberger/scala-sbt:11.0.12_1.5.5_2.12.14 docker tag hseeberger/scala-sbt:11.0.12_1.5.5_2.12.14 sbt
-
spark-shell
docker build --target spark-shell -t spark-shell .
-
spark-submit
docker build --target spark-submit -t spark-submit .
-
spark-history-server
docker build --target spark-history-server -t spark-history-server .
You could then run the following commands from the Spark application root
(the folder containing the build.sbt
file). Please make sure to use the
provided template project.
The commands below are provided as a reference, and they will be used throughout
the rest of this guide. You do not have to run them now, because some of them
require additional parameters (e.g. spark-submit
) that we will provide later
in the manual.
-
To run SBT to package or test your application (
sbt <command>
)docker run -it --rm -v "`pwd`":/root sbt sbt
-
To start a Spark shell (
spark-shell
)docker run -it --rm -v "`pwd`":/io spark-shell
-
To run your Spark application (
spark-submit
) (fill in the class name of your application and the name of your project!)docker run -it --rm -v "`pwd`":/io -v "`pwd`"/spark-events:/spark-events \ spark-submit --class <YOUR_CLASSNAME> \ target/scala-2.12/<YOUR_PROJECT_NAME>_2.12-1.0.jar
-
To spawn the history server to view event logs, accessible at localhost:18080
docker run -it --rm -v "`pwd`"/spark-events:/spark-events \ -p 18080:18080 spark-history-server
The further we get in the manual, we will generally not mention the full Docker
commands this explicitly again, so know that if we mention e.g. spark-shell
,
you should run the corresponding docker run
command listed above. You can
create scripts or aliases for your favorite shell to avoid having to type a lot.