Hive architecture - Big Data In Real World

Hive architecture

How to find the version of Hadoop and Hive?
October 18, 2021
How to get DDL or create script of an existing Hive table?
October 25, 2021
How to find the version of Hadoop and Hive?
October 18, 2021
How to get DDL or create script of an existing Hive table?
October 25, 2021

In this post we will explain the architecture of Hive along with the various components involved and their functions.

Hive architecture

HiveServer2

HiveServer2 is an improved implementation of HiveServer1 and was introduced with Hive 0.11. HiveServer2 is responsible for the following functions.

  • Thrift service to support concurrent client connections and sessions
  • Support common ODBC and JDBC drivers
  • Authentication support via Kerberos, LDAP and other pluggable implementations
  • Authorization
  • Query optimization and execution

HiveServer2 is a container for the Hive execution engine. For each client connection, it creates a new execution context that serves Hive SQL requests from the client.

Compiler and Execute Engine

When a client executes a Hive query it is sent to the compiler and Hive optimizes the query, creates a query plan and creates an execution plan and finally executes it against the data in HDFS.

Metastore database

Metastore database is not part of HiveServer2 (and it is not shown in the picture). Every Hive installation needs to have an RDBMS like Derby (good for dev environments only), Oracle or MySQL.

Hive stores the metadata of the tables and database that is managed by Hive in the metastore database. Note that this database doesn’t hold the actual data. The data will reside in HDFS.

Metastore

Metastore service runs inside Hiveserver2 and will communicate with the configured metastore database to look up the metadata information of the tables and database that is managed by Hive.

Hive clients

Hive CLI was deprecated and was replaced by Beeline to access Hive. Beeline connects to the HiveServer2 and acts as an interface or client for users to run queries and see results.

Hive also supports other clients using ODBC and JDBC to HiveServer2.

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

Hive architecture
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X