Hadoop 3.0 Docker

Hadoop 3.0 Docker : The docker enables users to bundle an application together with its preferred execution environment. In this article, we will talk about Hadoop and docker together. What are the benefit and single node setup.

Hadoop 3.0 Docker

Hadoop 3.0 Docket : Benefits

Setup, Installs & Runs Hadoop 3.0 in no time.
Uses the available resources as per need, so no wastage of resource.
Easy scale, best suited for testing environments in Hadoop 3.0 cluster.
No worries of Hadoop 3.0 Dependencies & libraries etc.

Single Node Docker Setup

Assumption & Hardware Requirement

Ubuntu 16.04 system
Docker is already installed and configured

Before we start the single node Hadoop 3.0 cluster using docker, let us just run simple example to see that docker is working correctly on my system.

hadoop@hadoop-VirtualBox:~$ docker ps

We don’t have any docker image available. So lets run a simple hello-world docker example.

    hadoop@hadoop-VirtualBox:~$ docker run hello-world

    Hello from Docker!

    This message shows that your installation appears to be working correctly.

    To generate this message, Docker took the following steps:

        The Docker client contacted the Docker daemon.
        The Docker daemon pulled the "hello-world" image from the Docker Hub.
        The Docker daemon created a new container from that image which runs the

    executable that produces the output you are currently reading.

        The Docker daemon streamed that output to the Docker client, which sent it

    to your terminal.

    To try something more ambitious, you can run an Ubuntu container with:

    $ docker run -it ubuntu bash

    Share images, automate workflows, and more with a free Docker Hub account:

    https://hub.docker.com

    For more examples and ideas, visit:

    https://docs.docker.com/engine/userguide/

It is confirmed that docker is working properly. Let us go ahead and install Hadoop 3.0 in a docker container. To do so, we need a Hadoop 3.x

    hadoop@hadoop-VM:~$ sudo docker pull images/hadoop-docker:latest
    [sudo] password for hadoop:
    3.0.2: Pulling from sequenceiq/hadoop-docker
    94b97xa021dx: Pull complete
    ............: Pull complete
    59:xaef4e99f98g4ca8b655dea261b92554740ec3c133e0826866c49319af7359db
    Status: Downloaded newer image for hadoop-docker:3.0.2

Run following docket command to validate if docket image is available or not.

Configure YARN

The following properties should be set in yarn-site.xml:

configuration
property
nameyarn.nodemanager.container-executor.class/name
valueorg.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor/value
description
This is the container executor setting that ensures that all applications
are started with the LinuxContainerExecutor.
/description
/property

property
nameyarn.nodemanager.linux-container-executor.group/name
valuehadoop/value
description
The POSIX group of the NodeManager. It should match the setting in
"container-executor.cfg". This configuration is required for validating
the secure access of the container-executor binary.
/description
/property

property
nameyarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users/name
valuefalse/value
description
Whether all applications should be run as the NodeManager process' owner.
When false, applications are launched instead as the application owner.
/description
/property

property
nameyarn.nodemanager.runtime.linux.allowed-runtimes/name
valuedefault,docker/value
description
Comma separated list of runtimes that are allowed when using
LinuxContainerExecutor. The allowed values are default, docker, and
javasandbox.
/description
/property

property
nameyarn.nodemanager.runtime.linux.docker.allowed-container-networks/name
valuehost,none,bridge/value
description
Optional. A comma-separated set of networks allowed when launching
containers. Valid values are determined by Docker networks available from
`docker network ls`
/description
/property

property
nameyarn.nodemanager.runtime.linux.docker.default-container-network/name
valuehost/value
description
The network used when launching Docker containers when no
network is specified in the request. This network must be one of the
(configurable) set of allowed container networks.
/description
/property

property
nameyarn.nodemanager.runtime.linux.docker.host-pid-namespace.allowed/name
valuefalse/value
description
Optional. Whether containers are allowed to use the host PID namespace.
/description
/property

property
nameyarn.nodemanager.runtime.linux.docker.privileged-containers.allowed/name
valuefalse/value
description
Optional. Whether applications are allowed to run in privileged
containers.
/description
/property

property
nameyarn.nodemanager.runtime.linux.docker.privileged-containers.acl/name
value/value
description
Optional. A comma-separated list of users who are allowed to request
privileged contains if privileged containers are allowed.
/description
/property

property
nameyarn.nodemanager.runtime.linux.docker.capabilities/name
valueCHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE/value
description
Optional. This configuration setting determines the capabilities
assigned to docker containers when they are launched. While these may not
be case-sensitive from a docker perspective, it is best to keep these
uppercase. To run without any capabilites, set this value to
"none" or "NONE"
/description
/property
/configuration

Hadoop 3.0 Docker - Conclusion

No Now a single node Hadoop 3.0 cluster using docker is up and running. To do a single node setup, nothing hard is really required and within no time cluster is running. As mentioned before, Docker for Hadoop 3.0 is mostly used for QA environments, so if you want to test an hadoop application, setting up hadoop cluster in a docker container and testing the hadoop application is the easiest and fastest way.

Topper Tips

Hadoop 3.0 Docker

Hadoop 3.0 Docket : Benefits

Single Node Docker Setup

Configure YARN

Hadoop 3.0 Docker - Conclusion

More on Hadoop 3.0 Related Topics