Hadoop 3.0 Docker
Hadoop 3.0 Docker : The docker enables users to bundle an application together with its preferred execution environment. In this article, we will talk about Hadoop and docker together. What are the benefit and single node setup.
Hadoop 3.0 Docket : Benefits
- Setup, Installs & Runs Hadoop 3.0 in no time.
- Uses the available resources as per need, so no wastage of resource.
- Easy scale, best suited for testing environments in Hadoop 3.0 cluster.
- No worries of Hadoop 3.0 Dependencies & libraries etc.
Single Node Docker Setup
Assumption & Hardware Requirement
- Ubuntu 16.04 system
- Docker is already installed and configured
Before we start the single node Hadoop 3.0 cluster using docker, let us just run simple example to see that docker is working correctly on my system.
hadoop@hadoop-VirtualBox:~$ docker ps
We don’t have any docker image available. So lets run a simple hello-world docker example.
hadoop@hadoop-VirtualBox:~$ docker run hello-world Hello from Docker! This message shows that your installation appears to be working correctly. To generate this message, Docker took the following steps: The Docker client contacted the Docker daemon. The Docker daemon pulled the "hello-world" image from the Docker Hub. The Docker daemon created a new container from that image which runs the executable that produces the output you are currently reading. The Docker daemon streamed that output to the Docker client, which sent it to your terminal. To try something more ambitious, you can run an Ubuntu container with: $ docker run -it ubuntu bash Share images, automate workflows, and more with a free Docker Hub account: https://hub.docker.com For more examples and ideas, visit: https://docs.docker.com/engine/userguide/
It is confirmed that docker is working properly. Let us go ahead and install Hadoop 3.0 in a docker container. To do so, we need a Hadoop 3.x
hadoop@hadoop-VM:~$ sudo docker pull images/hadoop-docker:latest [sudo] password for hadoop: 3.0.2: Pulling from sequenceiq/hadoop-docker 94b97xa021dx: Pull complete ............: Pull complete 59:xaef4e99f98g4ca8b655dea261b92554740ec3c133e0826866c49319af7359db Status: Downloaded newer image for hadoop-docker:3.0.2
Run following docket command to validate if docket image is available or not.
Configure YARN
The following properties should be set in yarn-site.xml:
configuration property nameyarn.nodemanager.container-executor.class/name valueorg.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor/value description This is the container executor setting that ensures that all applications are started with the LinuxContainerExecutor. /description /property property nameyarn.nodemanager.linux-container-executor.group/name valuehadoop/value description The POSIX group of the NodeManager. It should match the setting in "container-executor.cfg". This configuration is required for validating the secure access of the container-executor binary. /description /property property nameyarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users/name valuefalse/value description Whether all applications should be run as the NodeManager process' owner. When false, applications are launched instead as the application owner. /description /property property nameyarn.nodemanager.runtime.linux.allowed-runtimes/name valuedefault,docker/value description Comma separated list of runtimes that are allowed when using LinuxContainerExecutor. The allowed values are default, docker, and javasandbox. /description /property property nameyarn.nodemanager.runtime.linux.docker.allowed-container-networks/name valuehost,none,bridge/value description Optional. A comma-separated set of networks allowed when launching containers. Valid values are determined by Docker networks available from `docker network ls` /description /property property nameyarn.nodemanager.runtime.linux.docker.default-container-network/name valuehost/value description The network used when launching Docker containers when no network is specified in the request. This network must be one of the (configurable) set of allowed container networks. /description /property property nameyarn.nodemanager.runtime.linux.docker.host-pid-namespace.allowed/name valuefalse/value description Optional. Whether containers are allowed to use the host PID namespace. /description /property property nameyarn.nodemanager.runtime.linux.docker.privileged-containers.allowed/name valuefalse/value description Optional. Whether applications are allowed to run in privileged containers. /description /property property nameyarn.nodemanager.runtime.linux.docker.privileged-containers.acl/name value/value description Optional. A comma-separated list of users who are allowed to request privileged contains if privileged containers are allowed. /description /property property nameyarn.nodemanager.runtime.linux.docker.capabilities/name valueCHOWN,DAC_OVERRIDE,FSETID,FOWNER,MKNOD,NET_RAW,SETGID,SETUID,SETFCAP,SETPCAP,NET_BIND_SERVICE,SYS_CHROOT,KILL,AUDIT_WRITE/value description Optional. This configuration setting determines the capabilities assigned to docker containers when they are launched. While these may not be case-sensitive from a docker perspective, it is best to keep these uppercase. To run without any capabilites, set this value to "none" or "NONE" /description /property /configuration
Hadoop 3.0 Docker - Conclusion
No Now a single node Hadoop 3.0 cluster using docker is up and running. To do a single node setup, nothing hard is really required and within no time cluster is running. As mentioned before, Docker for Hadoop 3.0 is mostly used for QA environments, so if you want to test an hadoop application, setting up hadoop cluster in a docker container and testing the hadoop application is the easiest and fastest way.
More on Hadoop 3.0 Related Topics