Hadoop 3.0 Installation

This blog, Apache Hadoop 3.0 Installation, will assist you install and verify your pseudo-distributed, single-node & distributed instance in UNIX box (RHLE or Ubuntu). Hadoop 3.0 needs java 1.8 and higher, so this article assumes that you already aware of Hadoop 3.0 features and enhancement and minimum jdk requirement for newer version of Hadoop.  Distributed & cluster Hadoop 3.0 installation need password less ssh communication between cluster nodes and I will also cover them in detail.

Lets look at the below diagram which depicts 2 scenarios, one is  pseudo-distributed, single-node and other is  distributed instance. You can skip the ssh step if you are trying to install it in single node machine.

Hadoop 3.0 Installation Cluster Hardware

Enable SSH in all nodes in cluster for Hadoop 3.0 Installation

Each node in cluster should be communicate to each other without seeking for authentication and to make it happens, passwordless SSH remote login is required.

Execute following command in Ubuntu Linux

sudo apt-get install ssh
sudo apt-get install pdsh

On successful installation, generate Key Pairs

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Now setup passwordless ssh

cat ~/.ssh/id_rsa.pub  >> ~/.ssh/authorized_keys

Now change the permission of file that contains the key

chmod 0600 ~/.ssh/authorized_keys

Verify the ssh in the node

ssh localhost

Download Hadoop 3.0 and Install

Visit Apache Hadoop 3.0 download site & download the tar file

Hadoop 3.0 download mirror link

following command will help to download and install (change the folder as needed for your execution)

cd /tmp
wget http://www-us.apache.org/dist/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz
tar -xzf /tmp/hadoop-3.0.0.tar.gz /home/toppertips/hadoop3.x

Hadoop 3.0 Configuration

Once we have extracted the hadoop 3.0 bundle, it is time to configure some of the path in .bashrc file, so open the .barhrc file in user’s home directory and add following parameters

export HADOOP_PREFIX="/home/toppertips/hadoop3.x"
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}

#after adding above line, run the following command
source .bashrc

Configure hadoop-env.sh

Update the configuration file called hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

export JAVA_HOME=/usr/lib/jvm/java-8/

Configure core-site.xml, hdfs-site.xml & mapred-site.xml

Edit configuration file core-site.xml, located in HADOOP_HOME/etc/hadoop and add following entries:

<configuration>
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://localhost:9000</value>
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/toppertips/hdata</value>
	</property>
</configuration>

Now open hdfs-site.xml file located in HADOOP_HOME/etc/hadoop and add the following entries

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>3</value>
  </property>
</configuration>

Now edit mapred-site.xml and if it is missing then copy it using the template available there

cp mapred-site.xml.template mapred-site.xml

edit following configuration in the file

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>

Configure Yarn xml file

mapred-site.xml file is located in your HADOOP_HOME/etc/hadoop location, open it and add/edit following part as shown below

<configuration>
  <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
 </property>
 <property>
   <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
   <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>

Now you are all set to start your Hadoop 3.0 installation services.

Start Hadoop 3.0 Installation Services

NOTE: This command execution will only be done on your first hadoop installation. Do not perform this on existing hadoop installation, else it will permanently erase all your data from HDFS file system. 

The first step to starting up your Hadoop 3.0 installation is formatting the Hadoop filesystem followed by hdfs service as shown below

#only perform after fresh and first installation
bin/hdfs namenode -format

#Now start the hdfs service using dfs command
sbin/start-dfs.sh

#it may give an error at the time of startup and then use following command
echo "ssh" | sudo tee /etc/pdsh/rcmd_default

#now start the yarn services
sbin/start-yarn.sh

#Once it is started successfully, check how many daemons are running
jps
2961 ResourceManager
2482 DataNode
3077 NodeManager
2366 NameNode
2686 SecondaryNameNode
3199 Jps

Congratulation, your installation is completed and you are ready to run MapReduce programs.

Hadoop 3.0 Installation Health Check

Previous version of  Hadoop 2.x, web UI port is 50070 and it has been moved to 9870 in Hadoop 3.0. It can be accessed via web UI from localhost:9870

Hadoop 3.0 Health Check Web UI

 

Hadoop 3.0 Downstream Compatibility

Following are the version compatibility matrix sheet indication the version of different Apache projects and their unit test status including basic functionality testing. This was done as part of Hadoop 3.0 Beta 1 release in Oct 2017.

Apache Project Version Compiles Unit Testing Status Basic Functional Testing
HBase 2.0.0
Spark 2.0
Hive 2.1.0
Oozie 5.0
Pig 0.16
Solr 6.x
Kafka 0.10

More on Hadoop 3.0 Related Topics

# Other Articles Link
1 All the newly added features and enhancements in Hadoop 3.0 Hadoop 3.0 features and enhancement
2 Detailed comparison between Hadoop 3.0 vs Hadoop 2.0 and what benefit it brings to the developer Hadoop 3.0 vs Hadoop 2.0
3 Hadoop 3.0 Installation Hadoop 3.0 Installation
4 Hadoop 3.0 Release Date Hadoop 3.0 Release Date
5 Hadoop 3. 0 Security Book Hadoop 3.0 Security by Ben and Joey
6 Demystify The Hadoop 3.0 Architecture and its components Hadoop 3.0 Architecture
7 Hadoop 3.0 & Hortonworks Support for it in HDP 3.0 Release Hadoop 3.0 Hortonworks

Venky Jayaraman

20+ years of extensive expertise developing and managing enterprise level application. 7+ years of extensive experience building and implementing Bigdata solution.