Hadoop 3.0 Installation on Windows

Apache Hadoop 3.0 Installation on Windows is a short and practical guide for bigdata engineers  to get their hands dirty. Since Hadoop 3.0 is not yet available with Cloudera CDH 6.x or Hortonworks HDP 3.x , this guide navigate you through the installation steps without cyngwin.

Since Hadoop 3.0 New Features are based on Java 1.8, you need following preparation before you start Hadoop 3.0 installation.

Java 1.8 For Hadoop 3.0 Installation on Window

You must have administrative priviledge to install JDK 1.8 on your windows machine. You can visit Oracle website and download the binaries to install it.

Once it is installed, or already installed, you can run java -version command on windows command prompt to validate the installation.

Java 1.8 Installation

If you have trouble running this command, you can check the windows path and JAVA_HOME variable.

Download & Extract Hadoop 3.0 Binaries

Download the latest Hadoop 3.0 Installation on Windows bundle from its official website. General availability (GA) marks a point of quality and stability  for the release series that indicates it’s ready for production use. You can also download the source code for this release which is around ~25Mb in size.  Hadoop 3.0 binaries will be somewhere around ~250Mb in size

Apache Hadoop 3.0 Release

Since Hadoop 3.0 is built on Java, there is no separate distribution for Unix or Windows. All the binaries are byte-code which can run anywhere.

On successful download, validate the size of Hadoop 3.0 bundle.

Hadoop 3.0 Installation Extract Windows

While extracting the tar file, you may also find unzipping error as shown below. To avoid this, I extracted the tar in a UNIX machine and transferred the untar version to windows machine.

Hadoop 3.0 Extract Error on Windows

Once extracted, the folder in windows look like this

Hadoop 3.0 Extract Folders

Windows Path Setup for Hadoop 3.0

Now we need to check and setup the JAVA_HOME and HADOOP_HOME path

HADOOP_HOME for Hadoop 3.0

Same way, set up the JAVA_HOME and java\bin folder in the path variables and verify those variables from command prompt

Hadoop and Java path for Hadoop 3.0

Configuration : HDFS 3.0 Installation on Windows

Edit file C:/hadoop_3_x/hadoop-3.0.0/etc/hadoop/core-site.xml, paste below xml paragraph and save this file.

<configuration>
   <property>
       <name>fs.defaultFS</name>
       <value>hdfs://localhost:9000</value>
   </property>
</configuration>

Rename “mapred-site.xml.template” to “mapred-site.xml” and edit this file C:/hadoop_3_x/hadoop-3.0.0/etc/hadoop/mapred-site.xml, paste below xml paragraph and save this file.

<configuration>
   <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
   </property>
</configuration>

create folder “data” under “C:/hadoop_3_x/hadoop-3.0.0/”

Edit file C:/hadoop_3_x/hadoop-3.0.0/etc/hadoop/hdfs-site.xml, paste below xml paragraph and save this file.

<configuration>
   <property>
       <name>dfs.replication</name>
       <value>1</value>
   </property>
   <property>
       <name>dfs.namenode.name.dir</name>
       <value>C:/hadoop_3_x/hadoop-3.0.0/data/namenode</value>
   </property>
   <property>
       <name>dfs.datanode.data.dir</name>
       <value>C:/hadoop_3_x/hadoop-3.0.0/data/datanode</value>
   </property>
</configuration>

Edit file C:/hadoop_3_x/hadoop-3.0.0/etc/hadoop/yarn-site.xml, paste below xml paragraph and save this file.

<configuration>
   <property>
    	<name>yarn.nodemanager.aux-services</name>
    	<value>mapreduce_shuffle</value>
   </property>
   <property>
      	<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>  
	<value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
</configuration>

Edit file C:/hadoop_3_x/hadoop-3.0.0/etc/hadoop/hadoop-env.cmd by closing the command line “JAVA_HOME=%JAVA_HOME%” instead of set “JAVA_HOME=C:\Java” (On C:\java this is path to file jdk.18.0)

@ren JAVA_HOME is required
set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_131

Hadoop 3.0 Environment Command Windows

 

Open cmd and typing command “hdfs namenode –format” . You will see

Hadoop 3.0 Data Format

 

Hadoop 3.0 Installation Health Check

Previous version of  Hadoop 2.x, web UI port is 50070 and it has been moved to 9870 in Hadoop 3.0. It can be accessed via web UI from localhost:9870

Hadoop 3.0 Health Check Web UI

 

Hadoop 3.0 Downstream Compatibility

Following are the version compatibility matrix sheet indication the version of different Apache projects and their unit test status including basic functionality testing. This was done as part of Hadoop 3.0 Beta 1 release in Oct 2017.

Apache Project Version Compiles Unit Testing Status Basic Functional Testing
HBase 2.0.0
Spark 2.0
Hive 2.1.0
Oozie 5.0
Pig 0.16
Solr 6.x
Kafka 0.10

More on Hadoop 3.0 Related Topics

# Other Articles Link
1 All the newly added features and enhancements in Hadoop 3.0 Hadoop 3.0 features and enhancement
2 Detailed comparison between Hadoop 3.0 vs Hadoop 2.0 and what benefit it brings to the developer Hadoop 3.0 vs Hadoop 2.0
3 Hadoop 3.0 Installation Hadoop 3.0 Installation
4 Hadoop 3.0 Release Date Hadoop 3.0 Release Date
5 Hadoop 3. 0 Security Book Hadoop 3.0 Security by Ben and Joey
6 Demystify The Hadoop 3.0 Architecture and its components Hadoop 3.0 Architecture
7 Hadoop 3.0 & Hortonworks Support for it in HDP 3.0 Release Hadoop 3.0 Hortonworks

Comments are closed.