Apache Hive development has shifted from the original Hive server (HiveServer1) to the new server (HiveServer2), and hence users and developers need to move to the new access tool. However, there’s more to this process than simply switching the executable name from “hive” to “beeline”. Apache Hive was a heavyweight command-line tool that accepted the command and runs them utilizing MapReduce. Later, the tool was split into a client-server model, in which HiveServer1 is the server (responsible for compiling and monitoring MapReduce jobs) and Hive CLI is the command-line interface (sends SQL to the server). Recently, the Hive community introduced HiveServer2 which is an enhanced Hive server designed for multi-client concurrency and improved authentication that also provides better support for clients connecting through JDBC and ODBC. Now HiveServer2, with Beeline as the command-line interface, is the recommended solution; HiveServer1 and Hive CLI are deprecated and the latter won’t even work with HiveServer2
The primary difference between the Hive CLI & Beeline involves how the clients connect to Apache Hive.
- The Hive CLI, which connects directly to HDFS and the Hive Metastore, and can be used only on a host with access to those services.
- Beeline, which connects to HiveServer2 and requires access to only one .jar file:
Hive CLI connects to a remote HiveServer1 instance using the Thrift protocol. To connect to a server, you specify the hostname and optionally the port number of the remote server:
> hive -h <hostname> -p <port>
In contrast, Beeline connects to a remote HiveServer2 instance using JDBC. Thus, the connection parameter is a JDBC URL that’s common in JDBC-based clients:
> beeline -u <url> -n <username> -p <password>
Here are a few URL examples:
jdbc:hive2://ubuntu:11000/db2?hive.cli.conf.printheader=true;hive.exec.mode.local.auto.inputbytes.max=9999#stab=salesTable;icol=customerID jdbc:hive2://?hive.cli.conf.printheader=true;hive.exec.mode.local.auto.inputbytes.max=9999#stab=salesTable;icol=customerID jdbc:hive2://ubuntu:11000/db2;user=foo;password=bar jdbc:hive2://server:10001/db;user=foo;password=bar?hive.server2.transport.mode=http;hive.server2.thrift.http.path=hs2
Apache Hive CLI VS Beeline: Query Execution
Executing queries in Beeline is very similar to that in Hive CLI. In Hive CLI:
> hive -e <query in quotes> > hive -f <query file name>
> beeline -e <query in quotes> > beeline -f <query file name>
In either case, if no -e or -f options are given, both client tools go into an interactive mode in which you can give and execute queries or commands line by line.
Apache Hive CLI VS Beeline: Variables
There are four namespaces for variables:
hiveconffor Hive configuration variables
systemfor system variables
envfor environment variables
hivevarfor Hive variables (HIVE-1096)
There are two ways to define a variable: as a command-line argument or using the
set command in interactive mode.
Defining Hive variables in the command line in Hive CLI:
> hive -d key=value > hive --define key=value > hive --hivevar key=value
Defining Hive variables in command line in Beeline
> beeline --hivevar key=value
Beeline Operating Modes and HiveServer2 Transport Modes
Beeline supports the following modes of operation:
Embedded: The Beeline client and the Hive installation both reside on the same host machine. No TCP connectivity is required.
Remote: Use remote mode to support multiple, concurrent clients executing queries against the same remote Hive installation. Remote transport mode supports authentication with LDAP and Kerberos. It also supports encryption with SSL. TCP connectivity is required.
Administrators may start HiveServer2 in one of the following transport modes:
TCP: HiveServer2 uses TCP transport for sending and receiving Thrift RPC messages.
HTTP: HiveServer2 uses HTTP transport for sending and receiving Thrift RPC messages.
While running in TCP transport mode, HiveServer2 supports the following authentication schemes: