Apache Pig Tutorial - Ordering Records - Big Data In Real World

Apache Pig Tutorial – Ordering Records

Apache Pig Tutorial – Grouping Records
December 19, 2015
Apache Pig Tutorial – Executing as a Script
December 20, 2015

Apache Pig Tutorial – Ordering Records

Goal of this tutorial is to learn Apache Pig concepts in a fast pace. So don’t except lengthy posts. All posts will be short and sweet. Most posts will have (very short) “see it in action” video.

In the previous post we look at how to group records and we also found average volume of stocks from year 2003. In this post we will see how to order or sort records using Apache Pig.

First lets load, group and find the average volume of stocks symbol from year 2003.

grunt> stocks = LOAD '/user/hirw/input/stocks' USING PigStorage(',') as (exchange:chararray, symbol:chararray, date:datetime, open:float, high:float, low:float, close:float, volume:int, adj_close:float);

grunt> filter_by_yr = FILTER stocks by GetYear(date) == 2003;

grunt> grp_by_sym = GROUP filter_by_yr BY symbol;

grunt> avg_volume = FOREACH grp_by_sym GENERATE group, ROUND(AVG(filter_by_yr.volume)) as avgvolume;

 Ordering Records

Use the ORDER operator to order the records. By default records are ordered in ascending order. Use DESC to order records in descending order.

grunt> avg_vol_ordered = ORDER avg_volume BY avgvolume DESC;

We can also choose to perform ordering with multiple columns . In the below instruction, the records will be ordered by symbol and the volume. In the below instruction group  refers to the symbol column.

grunt> avg_vol_ordered = ORDER avg_volume BY group, avgvolume DESC;

Display Results

grunt> DUMP avg_vol_ordered;

 See It In Action

Previous Lesson : Grouping Records

Next Lesson : Execute as a Script

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

2 Comments

  1. […] Previous Apache Pig Tutorial – Grouping Records […]

gdpr-image
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X