How to save Spark DataFrame directly to a Hive table? - Big Data In Real World

How to save Spark DataFrame directly to a Hive table?

What is the difference between NameNode and Secondary NameNode?
May 28, 2021
How does Spark decide the number of tasks and number of tasks to execute in parallel?
August 4, 2021
What is the difference between NameNode and Secondary NameNode?
May 28, 2021
How does Spark decide the number of tasks and number of tasks to execute in parallel?
August 4, 2021

It is a very common use case to process the data in Spark and save the processed data or Spark dataframe directly into a Hive table.

There are a couple ways to achieve this.

Do you like us to send you a 47 page Definitive guide on Spark join algorithms? ===>

Solution 1

Create Hivecontext

import org.apache.spark.sql.hive.HiveContext; 
HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());

df is the result dataframe you want to write to Hive. Below will write the contents of dataframe df to sales under the database sample_db. Since we are using the SaveMode Overwrite the contents of the table will be overwritten.

df.write().mode(SaveMode.Overwrite).saveAsTable("sample_db.sales");

 

Solution 2

Register the dataframe df to a temporary view name temp_table in Spark

df.createOrReplaceTempView("temp_table")

With below we are creating a table named sales by selecting the content of temp_table. Sales will have the same structure as temp_table.

sqlContext.sql("create table sample_db.sales as select * from temp_table");

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

How to save Spark DataFrame directly to a Hive table?
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X