Hadoop 3.0 or Bigdata jobs are in demand and in Hadoop 3.0 Interview Question article covers almost all the important topic including the reference link to other tutorials.
Hadoop 3.0 New Features Questions
What are the new features in Hadoop 3.0?
- Java 8 (jdk 1.8) as runtime for Hadoop 3.0
- Erasure Encoding for to reduce storage cost
- YARN Timeline Service v.2 (YARN-2928)
- New Default Ports for Several Services
- Intra-DataNode Balancer
- Shell Script Rewrite (HADOOP-9902)
- Shaded Client Jars
- Support for Opportunistic Containers
- MapReduce Task-Level Native Optimization
- Support for More than 2 NameNodes
- Support for Filesystem Connector
- Reworked Daemon and Task Heap Management
- Improved Fault-tolerance with Quorum Journal Manager
Read the complete feature detail in Hadoop 3.0 Enhancement & Feature blog.
Hadoop 3.0 Conceptual Interview Questions
Is Hadoop a framework or java library?
Hadoop is not a library, it is a framework that allows distributed processing of large data sets across nodes (sometile called slaves) of computers using simple and fault tolerant programming model. It is designed to scale out from a very few to thousands of machines, each machine provides local computation and storage. The Hadoop framework itself is designed to detect and handle failures at the application layer.
Hadoop is written in java by Apache Software Foundation. It process data very reliably and fault-tolerant manner.
Core components of Hadoop:
HDFS (Storage) + MapReduce/YARN (Processing)
Why Hadoop framework? Shouldn’t DFS (Distributed File System) be able to handle large volumes of data already?
There are cases and business scenario when the data sets cannot fit in a single physical machine, then Distributed File System (DFS) partitions the data, store and manages the data across different machines. But, DFS lacks the following technical complexities for which we need Hadoop framework:
When a lot of machines are involved chances of data loss increases. So, automatic fault tolerance and failure recovery become a prime concern.
Move data to computation:
If huge amounts of data are moved from storage to the computation machines then the speed depends on network bandwidth.
What is the difference between traditional RDBMS and Hadoop?
|Schema on write||Schema on read|
|Scale up approach||Scale out approach|
|Relational tables||Key-value format|
|Structured queries||Function programming|
|Online Transactions||Batch processing|