All Stories

Transpose & Pivot In Hive Query

Apache Hive does not have direct standard UDF for transposing rows into columns. Transpose & Pivot in Hive Query can be achieved using multi-stage process. You can use collect_list() or...

Apache Hive Vectorization

Apache Hive Vectorization was introduced newly in Apache Hive to improve query performance. By default, the Apache Hive query execution engine processes one row of a table at a time....

Apache Hive Release 3.1.1

Apache Hive Release 3.1.1 is the version which is compatible with Hadoop 3.x.y and fixes 4 bugs and one new Feature Apache Hive Release 3.1.1 Release Note Following Bug Fixes...

Apache Hive Cli Vs Beeline

Apache Hive development has shifted from the original Hive server (HiveServer1) to the new server (HiveServer2), and hence users and developers need to move to the new access tool. However,...

Apache Hive Cheat Sheet

Apache Hive Cheat Sheet is a summary of all functions and syntax for big data engineers and developers reference. It is divided into 5 parts. Apache Hive Cheat Sheet -...

Apache Hive Best Practice

As big data engineer, you must know the apachehive best practices.As you know Apache Hive is not an RDBMS, but it pretends to be one most of the time. It...

Apache Hive Analytical Functions

Apache Hive Analytical Functionsavailable since Hive 0.11.0, are a special group of functions that scan the multiple input rows to compute each output value. Apache Hive Analytical Functions are usually...

Snowflake Architecture Cheat Sheet

Architecture No software, No Hardware, No maintenance. Snowflake is provided as Software-as-a-Service (SaaS) that runs completely on cloud infrastructure Snowflake uses a central data repository for persisted data that is...

PySpark Tutorial

In this PySpark Tutorial, we will understand why PySpark is becoming popular among data engineers and data scientist. This PySpark Tutorial will also highlight the key limilation of PySpark over...

What Is Data Lineage

What is data lineage and why it is important. Data lineage is nothing but its origins and transformation that data goes through with time. Data lineage can also be expressed...

Data Lineage Vs Data Provenance

Data Lineage and Data Provenance are not the same thing. Many data engineer and architect use them interchangible but they are two different concept and has its separate meaning.

Spark Dataframe Minus Minutes Operation In Scala

How to perform minus operation on a date type or timestamp time.

What Is Apache Nifi

Apache NiFi is a software project from the Apache Software Foundation designed to automate the flow of data between software systems (file system, RDBMS, APIs etc in and out) ....

Interacting With Windows Registry Using Chef

One of the most well-known differences between managing UNIX-like systems and Windows systems is the Windows Registry. Chef has resources for creating, modifying, and deleting Windows Registry keys. Beware that...

Installing Software Packages In Windows Using Chef

A large number of managed systems require configuration of software that is outside the scope of the built-in Windows roles and features. Chef has a very handy resource for installing...