Hadoop 3.0 Installation

This blog, Apache Hadoop 3.0 Installation, will assist you install and verify your pseudo-distributed, single-node & distributed instance in UNIX box (RHLE or Ubuntu). Hadoop 3.0 needs java 1.8 and higher, so this article assumes that you already aware of Hadoop 3.0 features and enhancementand minimum jdk requirement for newer version of Hadoop. Distributed & cluster Hadoop 3.0 installation need password less ssh communication between cluster nodes and I will also cover them in detail.

Lets look at the below diagram which depicts 2 scenarios, one ispseudo-distributed, single-nodeand other is distributed instance. You can skip the ssh step if you are trying to install it in single node machine.

Enable SSH in all nodes in cluster for Hadoop 3.0 Installation

Each node in cluster should be communicate to each other without seeking for authentication and to make it happens, passwordless SSH remote login is required.

Execute following command in Ubuntu Linux

sudo apt-get install ssh
sudo apt-get install pdsh

On successful installation, generate Key Pairs

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Now setup passwordless ssh

cat ~/.ssh/id_rsa.pub  ~/.ssh/authorized_keys

Now change the permission of file that contains the key

chmod 0600 ~/.ssh/authorized_keys

Verify the ssh in the node

ssh localhost

Download Hadoop 3.0 and Install

Visit Apache Hadoop 3.0 download site & download the tar file

Hadoop 3.0 download mirror link

following command will help to download and install (change the folder as needed for your execution)

cd /tmp
wgethttp://www-us.apache.org/dist/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
tar -xzf /tmp/hadoop-3.0.0.tar.gz /home/toppertips/hadoop3.x

Hadoop 3.0 Configuration

Once we have extracted the hadoop 3.0 bundle, it is time to configure some of the path in .bashrc file, so open the .barhrc file in user’s home directory and add following parameters

export HADOOP_PREFIX="/home/toppertips/hadoop3.x"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

#after adding above line, run the following command
source .bashrc

Configure hadoop-env.sh

Update the configuration file called hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

export JAVA_HOME=/usr/lib/jvm/java-8/

Configure core-site.xml, hdfs-site.xml & mapred-site.xml

Edit configuration file core-site.xml,located in HADOOP_HOME/etc/hadoopand add following entries:

configuration
	property
		namefs.defaultFS/name
		valuehdfs://localhost:9000/value
	/property
	property
		namehadoop.tmp.dir/name
		value/home/toppertips/hdata/value
	/property
/configuration

Now open hdfs-site.xml file located in HADOOP_HOME/etc/hadoop and add the following entries

configuration
  property
    namedfs.replication/name
    value3/value
  /property
/configuration

Now edit mapred-site.xml and if it is missing then copy it using the template available there

cp mapred-site.xml.template mapred-site.xml

edit following configuration in the file

configuration
  property
    namemapreduce.framework.name/name
    valueyarn/value
  /property
/configuration

Configure Yarn xml file

mapred-site.xml file is located in your HADOOP_HOME/etc/hadoop location, open it and add/edit following part as shown below

configuration
  property
   nameyarn.nodemanager.aux-services/name
   valuemapreduce_shuffle/value
 /property
 property
   nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name
   valueorg.apache.hadoop.mapred.ShuffleHandler/value
 /property
/configuration

Now you are all set to start your Hadoop 3.0 installation services.

Start Hadoop 3.0 Installation Services

NOTE: This command execution will only be done on your first hadoop installation. Do not perform this on existing hadoop installation, else it will permanently erase all your data from HDFS file system.

The first step to starting up your Hadoop 3.0 installation is formatting the Hadoop filesystem followed by hdfs service as shown below

#only perform after fresh and first installation
bin/hdfs namenode -format

#Now start the hdfs service using dfs command
sbin/start-dfs.sh

#it may give an error at the time of startup and then use following command
echo "ssh" | sudo tee /etc/pdsh/rcmd_default

#now start the yarn services
sbin/start-yarn.sh

#Once it is started successfully, check how many daemons are running
jps
2961 ResourceManager
2482 DataNode
3077 NodeManager
2366 NameNode
2686 SecondaryNameNode
3199 Jps

Congratulation, your installation is completed and you are ready to run MapReduce programs.