What is the difference between UDF, UDAF and UDTF in Hive? - Big Data In Real World

What is the difference between UDF, UDAF and UDTF in Hive?

How to get a list of YARN applications that are currently running in a Hadoop cluster?
April 28, 2021
What is the difference between map and mapPartitions in Spark?
May 3, 2021

Hive has 3 different types of functions – User Defined Function (UDF), User Defined Aggregate Function (UDAF) and User Defined Table generating Function (UDTF).

User Defined Function (UDF)

UDFs operate on a single row and produce a single row as output.

length(STRING), round(DOUBLE, INT)  are examples of UDFs.

Assume you have a sales table with 10 rows. Applying the length() on a column in the table will still result in 10 rows.

SELECT length(col1) from sales;

User Defined Aggregate Function (UDAF)

UDAFs operate on multiple rows and produce a single row as output.

sum(col), min(col)  are couple of examples of UDAFs

Assume you have a sales table with 10 rows. Applying the sum() on a column in the table without a group by will still result in 1 row.

SELECT sum(quantity) from sales;

Note that UDAFs are aggregate functions so usually they are used with Group By clause.

SELECT sum(quantity) from sales;
group by category;

User Defined Table generating Function (UDTF)

UDTFs operate on single rows and produce multiple rows as output.

Functions like explode(Array), posexplode(Array) are examples of UDTFs.

Let’s look at an example of explode function in action.

Let’s say you have a table with one column which is of type array and has 1 row. The array has 3 elements – Tom, Jerry and Emily.

0: jdbc:hive2://ms2.hirw.com:2181,wk1.hirw.co> select array('Tom','Jerry','Emily');
+--------------------------+--+
|           _c0            |
+--------------------------+--+
| ["Tom","Jerry","Emily"]  |
+--------------------------+--+
1 row selected (0.631 seconds)

There are 2 flavors of explode, one flavor takes an Array and another takes a Map. In the below example explode function will take in an Array and explode the array into multiple rows. So if we have 3 elements in the array we will end up with 3 rows.

0: jdbc:hive2://ms2.hirw.com:2181,wk1.hirw.co> select explode(array('Tom','Jerry','Emily'));
+--------+--+
|  col   |
+--------+--+
| Tom    |
| Jerry  |
| Emily  |
+--------+--+
3 rows selected (0.774 seconds)

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

2 Comments

  1. […] Click here if you like to know the difference between UDF, UDAF and UDTF […]

  2. […] Click here if you like to know the difference between UDF, UDAF and UDTF […]

gdpr-image
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X