Spark Model of Parallel Computing

Spark Model of Parallel Computing and sometimes also called RDD is an important API. Spark Model of Parallel Computing internally uses RDD and part of Spark Core library.

Spark allows users to write a program for the driver (or master node) on a cluster computing system that can perform operations on data in parallel. Spark represents large datasets as RDDs—immutable, distributed collections of … read the rest

How Apache Spark Works

Why you should be worried about how Apache Spark works? To get the most out of Spark, it is important to understand some of the principles used to design Spark and, at a cursory level, how Spark programs are executed this article introduces the overall design of Spark as well as its place in the big data ecosystem. Spark is often considered … read the rest

Apache Sqoop Introduction

In this article, Apache Sqoop Introduction, we will primarily discuss why this tool exists. Apache sqoop is part of Hadoop Core project or part of Hadoop Ecosystem project.

Bigdata tools which we use for transferring data between Hadoop and relational database servers is what we call Sqoop. Sqoop primarily stands for Sql for Hadoop.

In addition, there are … read the rest

Beginners Impala Tutorial

The Beginners Impala Tutorial covers key concepts of in-memory computation technology called Impala. It is developed by Cloudera. MapReduce based frameworks like Hive is slow due to excessive I/O operations. Cloudera offers a separate tool and that tool is what we call Apache Impala. This Beginners Impala Tutorial will cover the whole concept of Cloudera Impala and how this Massive … read the rest

Hadoop 3.0 Interview Question

Hadoop 3.0 or Bigdata jobs are in demand and in Hadoop 3.0 Interview Question article covers almost all the important topic including the reference link to other tutorials.

Hadoop 3.0 Interview Question

Hadoop 3.0 New Features Questions

What are the new features in Hadoop 3.0?

  1. Java 8 (jdk 1.8) as runtime for Hadoop 3.0
  2. Erasure Encoding for to reduce storage cost
  3. YARN Timeline Service
read the rest

Compare Unix Kernel Shells

This short article talks and compare UNIX kernel shells, which many technical folks are confused of. The Unix operating system used a shell program called the Bourne Shell. Then, slowly, many other shell kernel were developed for different flavors of UNIX operating system. The following is some brief information about different shells:

  • sh—Bourne shell
  • csh—C shell
  • ksh
read the rest

Hadoop 3.0 GPU

Hadoop 3.0 GPU : Hadoop is still behind high performance capacity due to CPUs’ limited parallelism, though. GPU (Graphical Processing Unit) accelerated computing involves the use of a GPU together with a CPU to accelerate applications to data processing on GPU cluster toward higher efficiency. However, GPU cluster has low level data storage capacity

Leveraged Hadoop 3.0 GPU Computing

MapReduce … read the rest

Hadoop 3.0 Docker

Hadoop 3.0 Docker : The docker enables users to bundle an application together with its preferred execution environment. In this article, we will talk about Hadoop and docker together. What are the benefit and single node setup.

Hadoop 3.0 Docker

Hadoop 3.0 Docket : Benefits

  1. Setup, Installs  & Runs Hadoop 3.0 in no time.
  2. Uses the available resources as per need, so no
read the rest