How to make Hive recursively read files from all the sub directories? - Big Data In Real World

How to make Hive recursively read files from all the sub directories?

How to create a column with unique, incrementing index value in Spark?
April 27, 2022
Hadoop In Real World is changing to Big Data In Real World
February 19, 2023
How to create a column with unique, incrementing index value in Spark?
April 27, 2022
Hadoop In Real World is changing to Big Data In Real World
February 19, 2023

Let’s say you have a Hive table and the Hive table is pointing at a location or directory which has several sub directories and each subdirectories has files underneath it.

When you query the table however, Hive is only reading the files at the top level folder and ignoring all the files under the subdirectories.

Solution

There are two properties you need to make sure that is set during Hive execution. If recursive directories/files are common in your environment. Make sure to add the below properties to the hive-site.xml rather than setting this property at the application level.

SET hive.mapred.supports.subdirectories=TRUE; 
SET mapred.input.dir.recursive=TRUE;

Also note that these properties are not table specific properties they are Hive execution environment specific properties.

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to make Hive recursively read files from all the sub directories?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X