Apache Hive Cli Vs Beeline

Apache Hive development has shifted from the original Hive server (HiveServer1) to the new server (HiveServer2), and hence users and developers need to move to the new access tool. However, theres more to this process than simply switching the executable name from hive to beeline. Apache Hive was a heavyweight command-line tool that accepted the command and runsthem utilizing MapReduce. Later, the tool was split into a client-server model, in which HiveServer1 is the server (responsible for compiling and monitoring MapReduce jobs) and Hive CLI is the command-line interface (sends SQL to the server).Recently, the Hive community introduced HiveServer2 which is an enhanced Hive server designed for multi-client concurrency and improved authentication that also provides better support for clients connecting through JDBC and ODBC.Now HiveServer2, with Beeline as the command-line interface, is the recommended solution; HiveServer1 and Hive CLI are deprecated and the latter wont even work with HiveServer2

The primary difference between the Hive CLI & Beeline involves how the clients connect to ApacheHive.

  • The Hive CLI, which connects directly to HDFS and the Hive Metastore, and can be used only on a host with access to those services.
  • Beeline, which connects to HiveServer2 and requires access to only one .jar file:hive-jdbc-version-standalone.jar
Server Connection Hive CLI connects to a remote HiveServer1 instance using the Thrift protocol. To connect to a server, you specify the hostname and optionally the port number of the remote server:
 hive -h hostname -p
port
In contrast, Beeline connects to a remote HiveServer2 instance using JDB
C. Thus, the connection parameter is a JDBC URL thats common in JDBC-based clients:
 beeline -u  url -n username -p
password

Here are a few URL examples:

jdbc:hive2://ubuntu:11000/db2?hive.cli.conf.printheader=true;hive.exec.mode.local.auto.inputbytes.max=9999#stab=salesTable;icol=customerID
jdbc:hive2://?hive.cli.conf.printheader=true;hive.exec.mode.local.auto.inputbytes.max=9999#stab=salesTable;icol=customerID
jdbc:hive2://ubuntu:11000/db2;user=foo;password=bar
jdbc:hive2://server:10001/db;user=foo;password=bar?hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2

Apache Hive CLI VS Beeline: Query Execution

Executing queries in Beeline is very similar to that in Hive CLI. In Hive CLI:
 hive -e query in quotes
 hive -f query file name
In Beeline:
 beeline -e query in quotes
 beeline -f query file name

In either case, if no -e or -f options are given, both client tools go into an interactive mode in which you can give and execute queries or commands line by line.

Apache Hive CLI VS Beeline: Variables

There are four namespaces for variables:

  • hiveconffor Hive configuration variables
  • systemfor system variables
  • envfor environment variables
  • hivevarfor Hive variables (HIVE-1096)

There are two ways to define a variable: as a command-line argument or using thesetcommand in interactive mode.

Defining Hive variables in the command line in Hive CLI:

 hive -d key=value
 hive --define key=value
 hive --hivevar key=value

Defining Hive variables in command line in Beeline

 beeline --hivevar key=value

Beeline Operating Modes and HiveServer2 Transport Modes

Beeline supports the following modes of operation: Embedded:The Beeline client and the Hive installation both reside on the same host machine. No TCP connectivity is required. Remote: Use remote mode to support multiple, concurrent clients executing queries against the same remote Hive installation. Remote transport mode supports authentication with LDAP and Kerberos. It also supports encryption with SSL. TCP connectivity is required. Administrators may start HiveServer2 in one of the following transport modes: TCP: HiveServer2 uses TCP transport for sending and receiving Thrift RPC messages. HTTP: HiveServer2 uses HTTP transport for sending and receiving Thrift RPC messages. While running in TCP transport mode, HiveServer2 supports the following authentication schemes: