Hadoop 3.0 Docker

Hadoop 3.0 Docker : The docker enables users to bundle an application together with its preferred execution environment. In this article, we will talk about Hadoop and docker together. What are the benefit and single node setup.

Hadoop 3.0 Docker

Hadoop 3.0 Docket : Benefits

  1. Setup, Installs & Runs Hadoop 3.0 in no time.
  2. Uses the available resources as per need, so no wastage of resource.
  3. Easy scale, best suited for testing environments in Hadoop 3.0 cluster.
  4. No worries of Hadoop 3.0 Dependencies & libraries etc.

Single Node Docker Setup

Assumption & Hardware Requirement

  1. Ubuntu 16.04 system
  2. Docker is already installed and configured

Before we start the single node Hadoop 3.0 cluster using docker, let us just run simple example to see that docker is working correctly on my system.

hadoop@hadoop-VirtualBox:~$ docker ps

We don’t have any docker image available. So lets run a simple hello-world docker example.

    hadoop@hadoop-VirtualBox:~$ docker run hello-world

    Hello from Docker!

    This message shows that your installation appears to be working correctly.

    To generate this message, Docker took the following steps:

        The Docker client contacted the Docker daemon.
        The Docker daemon pulled the "hello-world" image from the Docker Hub.
        The Docker daemon created a new container from that image which runs the

    executable that produces the output you are currently reading.

        The Docker daemon streamed that output to the Docker client, which sent it

    to your terminal.

    To try something more ambitious, you can run an Ubuntu container with:

    $ docker run -it ubuntu bash

    Share images, automate workflows, and more with a free Docker Hub account:

    https://hub.docker.com

    For more examples and ideas, visit:

    https://docs.docker.com/engine/userguide/

It is confirmed that docker is working properly. Let us go ahead and install Hadoop 3.0 in a docker container. To do so, we need a Hadoop 3.x

    hadoop@hadoop-VM:~$ sudo docker pull images/hadoop-docker:latest
    [sudo] password for hadoop:
    3.0.2: Pulling from sequenceiq/hadoop-docker
    94b97xa021dx: Pull complete
    ............: Pull complete
    59:xaef4e99f98g4ca8b655dea261b92554740ec3c133e0826866c49319af7359db
    Status: Downloaded newer image for hadoop-docker:3.0.2

Run following docket command to validate if docket image is available or not.

Configure YARN

The following properties should be set in yarn-site.xml:

configuration
  property
    nameyarn.nodemanager.container-executor.class/name
    valueorg.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor/value
    description
      This is the container executor setting that ensures that all applications
      are started with the LinuxContainerExecutor.
    /description
  /property

  property
    nameyarn.nodemanager.linux-container-executor.group/name
    valuehadoop/value
    description
      The POSIX group of the NodeManager. It should match the setting in
      "container-executor.cfg". This configuration is required for validating
      the secure access of the container-executor binary.
    /description
  /property

  property
    nameyarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users/name
    valuefalse/value
    description
      Whether all applications should be run as the NodeManager process' owner.
      When false, applications are launched instead as the application owner.
    /description
  /property

  property
    nameyarn.nodemanager.runtime.linux.allowed-runtimes/name
    valuedefault,docker/value
    description
      Comma separated list of runtimes that are allowed when using
      LinuxContainerExecutor. The allowed values are default, docker, and
      javasandbox.
    /description
  /property

  property
    nameyarn.nodemanager.runtime.linux.docker.allowed-container-networks/name
    valuehost,none,bridge/value
    description
      Optional. A comma-separated set of networks allowed when launching
      containers. Valid values are determined by Docker networks available from
      `docker network ls`
    /description
  /property

  property
    nameyarn.nodemanager.runtime.linux.docker.default-container-network/name
    valuehost/value
    description
      The network used when launching Docker containers when no
      network is specified in the request. This network must be one of the
      (configurable) set of allowed container networks.
    /description
  /property

  property
    nameyarn.nodemanager.runtime.linux.docker.host-pid-namespace.allowed/name
    valuefalse/value
    description
      Optional. Whether containers are allowed to use the host PID namespace.
    /description
  /property

  property
    nameyarn.nodemanager.runtime.linux.docker.privileged-containers.allowed/name
    valuefalse/value
    description
      Optional. Whether applications are allowed to run in privileged
      containers.
    /description
  /property

  property
    nameyarn.nodemanager.runtime.linux.docker.privileged-containers.acl/name
    value/value
    description
      Optional. A comma-separated list of users who are allowed to request
      privileged contains if privileged containers are allowed.
    /description
  /property

  property
    nameyarn.nodemanager.runtime.linux.docker.capabilities/name
    valueCHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE/value
    description
      Optional. This configuration setting determines the capabilities
      assigned to docker containers when they are launched. While these may not
      be case-sensitive from a docker perspective, it is best to keep these
      uppercase. To run without any capabilites, set this value to
      "none" or "NONE"
    /description
  /property
/configuration

Hadoop 3.0 Docker - Conclusion

No Now a single node Hadoop 3.0 cluster using docker is up and running. To do a single node setup, nothing hard is really required and within no time cluster is running. As mentioned before, Docker for Hadoop 3.0 is mostly used for QA environments, so if you want to test an hadoop application, setting up hadoop cluster in a docker container and testing the hadoop application is the easiest and fastest way.

More on Hadoop 3.0 Related Topics