Apache Hive Vectorization was introduced newly in Apache Hive to improve query performance. By default, the Apache Hive query execution engine processes one row of a table at a time. The one row of data goes through all the operators in the query before the next row is processed, resulting in very inefficient CPU usage. In vectorized query execution, … read the rest
Apache Hive Release 3.1.1 is the version which is compatible with Hadoop 3.x.y and fixes 4 bugs and one new Feature
Apache Hive Release 3.1.1 Release Note
Following Bug Fixes are part of this release
- [HIVE-18767] – Some alterPartitions invocations throw ‘NumberFormatException: null’
- [HIVE-18778] – Needs to capture input/output entities in explain
- [HIVE-20906] –
Apache Hive Cheat Sheet is a summary of all functions and syntax for big data engineers and developers reference. It is divided into 5 parts.
Apache Hive Cheat Sheet – Query Syntax
Apache Hive Cheat Sheet – Metadata
Apache Hive Cheat Sheet – Query Compatibility
Apache Hive Cheat Sheet – Command Line
Apache Hive Cheat Sheet – Shell & CLI… read the rest
As big data engineer, you must know the apache hive best practices. As you know Apache Hive is not an RDBMS, but it pretends to be one most of the time. It has tables, it runs SQL, and it supports both JDBC and ODBC. Hive lets you use SQL on Hadoop, but tuning SQL on a distributed system is different. … read the rest
Apache Hive development has shifted from the original Hive server (HiveServer1) to the new server (HiveServer2), and hence users and developers need to move to the new access tool. However, there’s more to this process than simply switching the executable name from “hive” to “beeline”. Apache Hive was a heavyweight command-line tool that accepted the command and runs … read the rest
In this article, Apache Sqoop Introduction, we will primarily discuss why this tool exists. Apache sqoop is part of Hadoop Core project or part of Hadoop Ecosystem project.
Bigdata tools which we use for transferring data between Hadoop and relational database servers is what we call Sqoop. Sqoop primarily stands for Sql for Hadoop.
In addition, there are … read the rest
The Beginners Impala Tutorial covers key concepts of in-memory computation technology called Impala. It is developed by Cloudera. MapReduce based frameworks like Hive is slow due to excessive I/O operations. Cloudera offers a separate tool and that tool is what we call Apache Impala. This Beginners Impala Tutorial will cover the whole concept of Cloudera Impala and how this Massive … read the rest
Hadoop 3.0 or Bigdata jobs are in demand and in Hadoop 3.0 Interview Question article covers almost all the important topic including the reference link to other tutorials.
Hadoop 3.0 New Features Questions
What are the new features in Hadoop 3.0?
- Java 8 (jdk 1.8) as runtime for Hadoop 3.0
- Erasure Encoding for to reduce storage cost
- YARN Timeline Service
This short article talks and compare UNIX kernel shells, which many technical folks are confused of. The Unix operating system used a shell program called the Bourne Shell. Then, slowly, many other shell kernel were developed for different flavors of UNIX operating system. The following is some brief information about different shells:
- sh—Bourne shell
- csh—C shell
Hadoop 3.0 GPU : Hadoop is still behind high performance capacity due to CPUs’ limited parallelism, though. GPU (Graphical Processing Unit) accelerated computing involves the use of a GPU together with a CPU to accelerate applications to data processing on GPU cluster toward higher efficiency. However, GPU cluster has low level data storage capacity
Leveraged Hadoop 3.0 GPU Computing
MapReduce … read the rest