Hadoop Mapper and Reducer Output Type Mismatch - Big Data In Real World

Hadoop Mapper and Reducer Output Type Mismatch

Apache Pig Tutorial – Map
December 31, 2015
Missing Artifact JDK Tools Jar
June 23, 2016

Hadoop Mapper and Reducer Output Mismatch

Can you have different output Key Value pair types for Mapper and Reducer in a MapReduce program?

Short answer – absolutely yes.

Below signature for Mapper and Reducer from the same MapReduce program and they both are totally valid.

public class MaxClosePriceMapper extends Mapper<LongWritable, Text, Text, FloatWritable> 

public class MaxClosePriceReducer extends Reducer<Text, FloatWritable, FloatWritable, Text>

 

We absolutely know the above is valid. Yet, when we execute the MapReduce program, the execution fail with the below error.

16/06/23 01:58:11 INFO mapreduce.Job: Task Id : attempt_1458616310472_2428_m_000002_0, Status : FAILED
Error: java.io.IOException: wrong key class: class org.apache.hadoop.io.FloatWritable is not class org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:196)
at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1307)
at org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1624)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.reduce.WrappedReducer$Context.write(WrappedReducer.java:105)
at com.hirw.maxcloseprice.MaxClosePriceReducer.reduce(MaxClosePriceReducer.java:31)
at com.hirw.maxcloseprice.MaxClosePriceReducer.reduce(MaxClosePriceReducer.java:14)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1645)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)

 What went wrong ?

Here is the driver program. This looks OK right ? So what went wrong ?

Here in the Driver program below we are using the same Reducer class for the Combiner. The Combiner runs on the Map side and the output key value pairs from the combiner will be sent as the input to the Reducer.

We are reusing the Reducer for the Combiner, so the type of the output key value pair from the Combiner (Reducer) will be FloatWritable and Text and it will not match with the Reducer’s input type key value pair – Text and FloatWritable and hence the error.

Job job = new Job();
job.setJarByClass(MaxClosePrice.class);
job.setJobName("MaxClosePrice");

//Set input and output locations
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

//Set Input and Output formats
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);

//Set Mapper and Reduce classes
job.setMapperClass(MaxClosePriceMapper.class);
job.setReducerClass(MaxClosePriceReducer.class);

//Combiner (optional)
job.setCombinerClass(MaxClosePriceReducer.class);

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(FloatWritable.class);

//Output types
job.setOutputKeyClass(FloatWritable.class);
job.setOutputValueClass(Text.class);

 So the Solution ?

Don’t reuse the Reducer for Combiner if the Reducer’s input and output key value pair types does not match. In the above program simply comment the below line and the program should work.

//job.setCombinerClass(MaxClosePriceReducer.class);

 

Big Data In Real World
Big Data In Real World
We are a group of Big Data engineers who are passionate about Big Data and related Big Data technologies. We have designed, developed, deployed and maintained Big Data applications ranging from batch to real time streaming big data platforms. We have seen a wide range of real world big data problems, implemented some innovative and complex (or simple, depending on how you look at it) solutions.

Comments are closed.

gdpr-image
This website uses cookies to improve your experience. By using this website you agree to our Data Protection Policy.

Hadoop In Real World is now Big Data In Real World!

X