Hadoop 3.0 Installation
This blog, Apache Hadoop 3.0 Installation, will assist you install and verify your pseudo-distributed, single-node & distributed instance in UNIX box (RHLE or Ubuntu). Hadoop 3.0 needs java 1.8 and higher, so this article assumes that you already aware of Hadoop 3.0 features and enhancementand minimum jdk requirement for newer version of Hadoop. Distributed & cluster Hadoop 3.0 installation need password less ssh communication between cluster nodes and I will also cover them in detail.
Lets look at the below diagram which depicts 2 scenarios, one ispseudo-distributed, single-nodeand other is distributed instance. You can skip the ssh step if you are trying to install it in single node machine.
Enable SSH in all nodes in cluster for Hadoop 3.0 Installation
Each node in cluster should be communicate to each other without seeking for authentication and to make it happens, passwordless SSH remote login is required.
Execute following command in Ubuntu Linux
sudo apt-get install ssh sudo apt-get install pdsh
On successful installation, generate Key Pairs
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Now setup passwordless ssh
cat ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
Now change the permission of file that contains the key
chmod 0600 ~/.ssh/authorized_keys
Verify the ssh in the node
ssh localhost
Download Hadoop 3.0 and Install
Visit Apache Hadoop 3.0 download site & download the tar file
following command will help to download and install (change the folder as needed for your execution)
cd /tmp wgethttp://www-us.apache.org/dist/hadoop/common/hadoop-3.0.0/hadoop-3.0.0.tar.gz tar -xzf /tmp/hadoop-3.0.0.tar.gz /home/toppertips/hadoop3.x
Hadoop 3.0 Configuration
Once we have extracted the hadoop 3.0 bundle, it is time to configure some of the path in .bashrc file, so open the .barhrc file in user’s home directory and add following parameters
export HADOOP_PREFIX="/home/toppertips/hadoop3.x" export PATH=$PATH:$HADOOP_PREFIX/bin export PATH=$PATH:$HADOOP_PREFIX/sbin export HADOOP_MAPRED_HOME=${HADOOP_PREFIX} export HADOOP_COMMON_HOME=${HADOOP_PREFIX} export HADOOP_HDFS_HOME=${HADOOP_PREFIX} export YARN_HOME=${HADOOP_PREFIX} #after adding above line, run the following command source .bashrc
Configure hadoop-env.sh
Update the configuration file called hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:
export JAVA_HOME=/usr/lib/jvm/java-8/
Configure core-site.xml, hdfs-site.xml & mapred-site.xml
Edit configuration file core-site.xml,located in HADOOP_HOME/etc/hadoopand add following entries:
configuration property namefs.defaultFS/name valuehdfs://localhost:9000/value /property property namehadoop.tmp.dir/name value/home/toppertips/hdata/value /property /configuration
Now open hdfs-site.xml file located in HADOOP_HOME/etc/hadoop and add the following entries
configuration property namedfs.replication/name value3/value /property /configuration
Now edit mapred-site.xml and if it is missing then copy it using the template available there
cp mapred-site.xml.template mapred-site.xml
edit following configuration in the file
configuration property namemapreduce.framework.name/name valueyarn/value /property /configuration
Configure Yarn xml file
mapred-site.xml file is located in your HADOOP_HOME/etc/hadoop location, open it and add/edit following part as shown below
configuration property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property property nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property /configuration
Now you are all set to start your Hadoop 3.0 installation services.
Start Hadoop 3.0 Installation Services
NOTE: This command execution will only be done on your first hadoop installation. Do not perform this on existing hadoop installation, else it will permanently erase all your data from HDFS file system.
The first step to starting up your Hadoop 3.0 installation is formatting the Hadoop filesystem followed by hdfs service as shown below
#only perform after fresh and first installation bin/hdfs namenode -format #Now start the hdfs service using dfs command sbin/start-dfs.sh #it may give an error at the time of startup and then use following command echo "ssh" | sudo tee /etc/pdsh/rcmd_default #now start the yarn services sbin/start-yarn.sh #Once it is started successfully, check how many daemons are running jps 2961 ResourceManager 2482 DataNode 3077 NodeManager 2366 NameNode 2686 SecondaryNameNode 3199 Jps
Congratulation, your installation is completed and you are ready to run MapReduce programs.
Hadoop 3.0 Installation Health Check
Previous version of Hadoop 2.x, web UI port is 50070 and it has been moved to 9870 in Hadoop 3.0. It can be accessed via web UI from localhost:9870
Hadoop 3.0 Downstream Compatibility
Following are the version compatibility matrix sheet indication the version of different Apache projects and their unit test status including basic functionality testing. This was done as part of Hadoop 3.0 Beta 1 release in Oct 2017.
[wpsm_comparison_table id=”1” class=””]
More on Hadoop 3.0 Related Topics
[wpsm_comparison_table id=”3” class=”hover-col1 “]