Hadoop 3.0 Interview Question

Hadoop 3.0 or Bigdata jobs are in demand and in Hadoop 3.0 Interview Question article covers almost all the important topic including the reference link to other tutorials.

Hadoop 3.0 Interview Question

Hadoop 3.0 New Features Questions

What are the new features in Hadoop 3.0?

  1. Java 8 (jdk 1.8) as runtime for Hadoop 3.0
  2. Erasure Encoding for to reduce storage cost
  3. YARN Timeline Service v.2 (YARN-2928)
  4. New Default Ports for Several Services
  5. Intra-DataNode Balancer
  6. Shell Script Rewrite (HADOOP-9902)
  7. Shaded Client Jars
  8. Support for Opportunistic Containers
  9. MapReduce Task-Level Native Optimization
  10. Support for More than 2 NameNodes
  11. Support for Filesystem Connector
  12. Reworked Daemon and Task Heap Management
  13. Improved Fault-tolerance with Quorum Journal Manager

Read the complete feature detail in Hadoop 3.0 Enhancement & Feature blog.

Hadoop 3.0 Conceptual Interview Questions

Is Hadoop a framework or java library?

Hadoop is not a library, it is a framework that allows distributed processing of large data sets across nodes (sometile called slaves) of computers using simple and fault tolerant programming model. It is designed to scale out from a very few to thousands of machines, each machine provides local computation and storage. The Hadoop framework itself is designed to detect and handle failures at the application layer.
Hadoop is written in java by Apache Software Foundation.  It process data very reliably and fault-tolerant manner.
Core components of Hadoop:
HDFS (Storage) + MapReduce/YARN (Processing)

Why Hadoop framework? Shouldn’t DFS (Distributed File System) be able to handle large volumes of data already?

There are cases and business scenario when the data sets cannot fit in a single physical machine, then Distributed File System (DFS) partitions the data, store and manages the data across different machines. But, DFS lacks the following technical complexities for which we need Hadoop framework:

Fault tolerant:
When a lot of machines are involved chances of data loss increases. So, automatic fault tolerance and failure recovery become a prime concern.

Move data to computation:
If huge amounts of data are moved from storage to the computation machines then the speed depends on network bandwidth.

What is the difference between traditional RDBMS and Hadoop?

RDBMS Hadoop
Schema on write Schema on read
Scale up approach Scale out approach
Relational tables Key-value format
Structured queries Function programming
Online Transactions Batch processing

 

Comments are closed.