How does LATERAL VIEW work in Hive? - Big Data In Real World

How does LATERAL VIEW work in Hive?

How to fix Incompatible clusterIDs error during DataNode startup?
September 1, 2021
What do the numbers on the progress bar mean in Spark shell or Spark UI?
September 6, 2021
How to fix Incompatible clusterIDs error during DataNode startup?
September 1, 2021
What do the numbers on the progress bar mean in Spark shell or Spark UI?
September 6, 2021

LATERAL VIEW is a very powerful concept in Hive. It is used when we have to work with data with complex types. Let’s see this with an example.

Problem

Let’s say we have an employee table with employee name and an Array of department ids the employee belongs to.

 select * from employee;

+-----------------+---------------------+--+
| employee.ename  | employee.dept_list  |
+-----------------+---------------------+--+
| Tom             | [20]                |
| Jerry           | [10,20]             |
| Riley           | [20,30,40]          |
+-----------------+---------------------+--+

With the data above we would like to count the number of departments the employee belongs to. The output should look like below.

+--------+------+--+
| ename  | _c1  |
+--------+------+--+
| Jerry  | 2    |
| Riley  | 3    |
| Tom    | 1    |
+--------+------+--+

This is an easy problem to solve if the data we have is not nested. But with our data the dept_list is an Array of integers providing the list of departments the employee belongs to.

Solution

Lateral view is used in conjunction with user-defined table generating functions such as explode(). A UDTF generates zero or more output rows for each input row. 

Click here if you like to know the difference between UDF, UDAF and UDTF

A lateral view first applies the UDTF to each row of the base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias.

LATERAL VIEW can’t function alone. It needs to be used along with a UDTF. Here we are using explode() to first explode the array to individual rows. For the exploded data we are naming the table as depts with a column dept_id.

LATERAL VIEW joins resulting output exploded rows to the input rows from employee providing the below output.

SELECT ename, dept_id FROM employee
LATERAL VIEW explode(dept_list) depts AS dept_id;
+--------+----------+--+
| ename  | dept_id  |
+--------+----------+--+
| Tom    | 20       |
| Jerry  | 10       |
| Jerry  | 20       |
| Riley  | 20       |
| Riley  | 30       |
| Riley  | 40       |
+--------+----------+--+
6 rows selected (0.16 seconds)

We can enhance the above query to group by ename and count on the dept_id to get the number of departments an employee belongs to.

SELECT ename, count(dept_id) FROM employee
LATERAL VIEW explode(dept_list) depts AS dept_id
GROUP BY ename;
+--------+------+--+
| ename  | _c1  |
+--------+------+--+
| Jerry  | 2    |
| Riley  | 3    |
| Tom    | 1    |
+--------+------+--+
3 rows selected (30.312 seconds)
Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

1 Comment

  1. […] Check out this post if you need more information on LATERAL VIEW […]

How does LATERAL VIEW work in Hive?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X