1. Sorting the values before they are send to the reducer

I'm thinking about building a small testing application in hadoop to get the hang of the system. The application I have in mind will be in the realm of doing statistics. I want ...

2. Hadoop: Do not re-schedule a failed reducer

This is how Hadoop currently works: If a reducer fails (throws a NullPointerException for example), Hadoop will reschedule another reducer to do the task of the reducer that failed. Is it possible ...

3. hadoop + Writable interface + readFields throws an exception in reducer

I have a simple map-reduce program in which my map and reduce primitives look like this map(K,V) = (Text, OutputAggregator)
reduce(Text, OutputAggregator) = (Text,Text) The important point is that from my map function I ...

4. Hadoop reducer string manipulation doesn't work

Hi Text manipulation in Reduce phase seems not working correctly. I suspect problem could be in my code rather then hadoop itself but you never know... If you can spot any gotchas let ...

5. Accessing a mapper's counter from a reducer

I need to access the counters from my mapper in my reducer. Is this possible? If so how is it done? As an example: my mapper is: public class CounterMapper extends Mapper ...

6. Do something to the entire Reducer values list based on one element

I have an interesting problem that I'm struggling to fit in MapReduce. I have a bunch of log entries. What I need to do is something like this: Check if any entry ...

7. Hadoop, how to compress mapper output but not the reducer output

I have a mapreduce java program in which I try to only compress the mapper output but not the reducer output. I thought that this would be possible by setting ...

8. Hadoop: Mapper is getting executed but reducer is not

Basically I want to do following:

  1. Write a mapper and reducer class with map and reduce function in a java application external to Hadoop environment.
  2. Then send this job to Hadoop which will use mapper and reducer ...

9. null buffer in ReadFields() in reducer for complex types

i am trying to pass a complex writable between mapper and reducer, more specifically ArrayWritable of ObjectWritables. public class ObjectArrayWritable extends ArrayWritable { ...

10. Hadoop streaming reducer doesn't produce any output when input is empty

The following hadoop streaming behavior seems unintuitive and undocumented, if not outright a bug: Mapper:

#!/usr/bin/env python

# exhaust the input without producing any output

import sys
for line in sys.stdin:
#!/usr/bin/env python

import ...

11. Hadoop Looping the Reducer

I am trying to find a way to "loop" my reducer, for example:

for(String document: tempFrequencies.keySet())
testMap.put(key.toString(), DF.format(tfIDF));
//This allows me to create a hashmap which i plan to write out to context as ...

12. concatenating reducer outputs from SequenceFileOutputFormat

I've got a job that uses 100 reducers config'ed with setOutputFormat (SequenceFileOutputFormat.class); After the job runs, can I combine all of the part files via the following command and have things work correctly with the ...

13. Hadoop 0.20.2 reducer throws ArrayIndexOutOfBoundsException when iterating values

I am fairly new to hadoop, however, I've been reading "Hadoop: The definitive guide", so I think I have an understanding of the basic concepts. I used Hadoop 0.20.2 to run a ...

14. Hadoop Reducer unable to accumulate all values in one iteration

I have a basic scenario in Hadoop: All mappers send all values to the same key. Therefore all values end up on the same reducer. However, when I iterate the values in the ...