Hadoop 3.0 Cloudera

Hadoop 3.0 Cloudera (or CDH 6.x)is an obvious question after you have seen Hadoop 3.0 new feature and enhancement list. At the time of writing this blog, CDH was having 5.14 supporting Hadoop 2.7.3. Hadoop 3.0 has lot of changes and if you want to try it in stand alone mode before it becomes available, it is available for installation. Since the changes in Hadoop 3.0 is quite a lot compare to Hadoop 2.0, it would take time for companies likeCloudera or Hortonworksto make a production ready bundle for it.

Hadoop 3.0 GA was released on 14 Dec 2017 and if you follow the Hadoop 3.0 architecture, you may find very interesting changes which will certainly take few months for bundler to give next major release.

Most of the organization is now running enterprise data lake at their premises and storage cost reduction is one of the promising feature, Hodoop 3.0 Cloudera or CDH 5.14must arrive as soon as possible.

Hadoop 3.0 Cloudera Vs CDH 5.14

Hortonworks approach is to provide new bundling for minor version only when necessary to ensure that the interoperability of Apache project components. Following are the HDP components including the package version which are included in HDP 2.6.4. The list of Hadoop 3.0 Cloudera support is yet to be available.

Why Hodoop 3.0Cloudera

Why not Hadoop 3.0 Cloudera (or HDP 3.0)? The cost reduction, java 1.8 support, improve yarn timeline and many more cool feature will make developer’s, administrator’s and business life so easy. Hadoop 3.0 Cloudera make the overall data engineer projects more efficient and relatively much cheaper. On arrival of Hadoop 3.0 Hortonworks, user will see following upgrades

  1. Java 8 (jdk 1.8) as runtime for Hadoop 3.0
  2. Erasure Encoding for to reduce storage cost
  3. YARN Timeline Service v.2 (YARN-2928)
  4. New Default Ports for Several Services
  5. Intra-DataNode Balancer
  6. Shell Script Rewrite (HADOOP-9902)
  7. Shaded Client Jars
  8. Support for Opportunistic Containers
  9. MapReduce Task-Level Native Optimization
  10. Support for More than 2 NameNodes
  11. Support for Filesystem Connector
  12. Reworked Daemon and Task Heap Management
  13. Improved Fault-tolerance with Quorum Journal Manager

Downstream Compatibility with Other Apache Project

Following are the version compatibility matrix sheet indication the version of different Apache projects and their unit test status including basic functionality testing. This was done as part of Hadoop 3.0 Beta 1 release in Oct 2017.

More on Hadoop 3.0 Related Topics