reduce « hadoop « Java Database Q&A





1. Should map() and reduce() return key/value pairs of same type?    stackoverflow.com

When writing a MapReduce job (specifically Hadoop if relevant), one must define a map() and a reduce() function, both yielding a sequence of key/value pairs. The data types of the key ...

2. Hadoop one Map and multiple Reduce    stackoverflow.com

We have a large dataset to analyze with multiple reduce functions. All reduce algorithm work on the same dataset generated by the same map function. Reading the large dataset costs too much ...

3. Hadoop Reduce Error    stackoverflow.com

I keep getting Exceeded MAX_FAILED_UNIQUE_FETCHES; on the reduce phase even though I tried all the solutions I could find online. Please help me, I have a project presentation in three ...

4. What's the easiest way to explain What is Hadoop and Map/Reduce?    stackoverflow.com

It's very easy to explain NoSQL from high level view - it is basically "key-value" storage. Of course with thousand minor and important things, but in general it's just key value ...

5. Hadoop: How to find out the partition_Id in reduce step using Context object    stackoverflow.com

In Hadoop API ver. 0.20 and above the Context object was introduced instead JobConf. I need to find out using Context object 1) the partition_id for current Reducer 2) the output folder Using ...

6. Hadoop Spill failure    stackoverflow.com

I'am currently working on a project using Hadoop 0.21.0, 985326 and a cluster of 6 worker nodes and a head node. Submitting a regular mapreduce job fails, but I have no idea ...

7. Using Hadoop for the First Time, MapReduce Job does not run Reduce Phase    stackoverflow.com

I wrote a simple map reduce job that would read in data from the DFS and run a simple algorithm on it. When trying to debug it I decided to simply ...

8. Hadoop counters: how to access the Reporter object outside map() and reduce()    stackoverflow.com

To use counters I need to have an access to Reporter object. The Reporter object is passed as parameter to map() and reduce(), hence I can do: reporter.incrCounter(NUM_RECORDS, 1); But I need ...

9. What is the maximum number of records that a hadoop reducer's reduce() call can take?    stackoverflow.com

I have a mapper whose output is mapped to multiple different reducer instances by using my own Partitioner. My partitioner makes sure that a given is sent always to a given ...





10. Hadoop Map Reduce Program    stackoverflow.com

When I was trying the Map Reduce programming example from Hadoop in Action book based on Hadoop 0.20 API I got the error java.io.IOException: Type mismatch in value from map: expected ...

11. Write arbitrary map and reduce function    stackoverflow.com

I want to write my own map and reduce function in mapreduce framework How can I do that??(my programming language is java) Thanks.

12. Separating Hadoop Map and Reduce tasks    stackoverflow.com

In a 3 node hadoop cluster. I would like the master to be 1 node. Map task taking place in one node and reduce tasks in 1 node. Map and reduce ...

13. merge output files after reduce phase    stackoverflow.com

In mapreduce each reduce task write its output to a file named part-nnnnn where nnnnn is partirion ID associated with the reduce task, does map/red merge these files?? if yes, how?? ...

14. Compare and join two datasets using CompositeInputFormat in hadoop map/reduce    stackoverflow.com

I have a question regarding Joins in Map/Reduce. If I want to do inner join in hadoop Map/Reduce how would I do it. I have heard of CompositeInputFormat but haven't found much ...

15. HADOOP: emitting a Matrix from a mapper    stackoverflow.com

HI everyone I am new to hadoop map reduce, i wanted to know that there is some outputformat type which can allow me to emit a matrix(2d array) directly from the mapper ...

16. Hadoop , Map reduce chainig    stackoverflow.com

I have to implement the following map-->Reduce1-->Reduce 2 means the Reduce2 is a separate operation on output of Reduce 1. I want to get the values emitted by reduce 1 and ...





17. How to save only non empty reducers' output in HDFS    stackoverflow.com

In my application the reducer saves all the part files in HDFS but I want only the reducer will write the part files whose sizes are not 0bytes.Please let me know ...

18. Hadoop: Set slave as explicit reducer?    stackoverflow.com

we use a hadoop multi-node setup on debian + ubuntu with the latest stable hadoop release. is it possible to set a specific slave to be the reducer? i just use ...

19. Implementing third phase called merge after Reduce phase    stackoverflow.com

I need to add a third phase – merge – which combines the outputs of separate, parallel Reduce tasks.This makes it possible to do things like joins and build cartesian products.Can ...

20. A hadoop job complete without map and reduce on a Hadoop Cluster( one namenode ,12 datanode)    stackoverflow.com

description

I wrote a hadoop program and ran it on a single machine ,it worked good. But it encountered below problems(job didn't start and finished immediately after map start) when I migrated it ...

21. From "reduce input records" to "reduce input groups"    stackoverflow.com

After runing a MapRed job, we will get some summary about the job, for example:

...
reduce input records: 10
reduce input groups: 3
...
I knows this is caused by combine repeated keys. My question ...

22. Hadoop reduce task gets hung    stackoverflow.com

I set up a hadoop cluster with 4 nodes, When running a map-reduce task, the map task finishes quickly, while the reduce task hangs at 27% percent. I checked the log, ...

23. Setting the number of map tasks and reduce tasks    stackoverflow.com

I am currently running a job I fixed the number of map task to 20 but and getting a higher number. I also set the reduce task to zero but I ...

24. Why all the reduce tasks are ending up in a single machine?    stackoverflow.com

I wrote a relatively simple map-reduce program in Hadoop platform (cloudera distribution). Each Map & Reduce write some diagnostic information to standard ouput besides the regular map-reduce tasks. However when I'm ...

25. How to deal with unbalanced input of reduce task?    stackoverflow.com

Recently I was asked how to deal with unbalanced input of reduce task. I thought for while and try to redistribute the data, but didn't come up with a good solution. ...

26. Hadoop: Why might a furiously writing reduce task be timed out?    stackoverflow.com

I have a Hadoop reduce task that reads its input records in batches and does a lot of processing and writes a lot of output for each input batch. I ...

27. How to pass objects from Client to Map and Reduce?    stackoverflow.com

Is that the class should extend ObjectWritable class? Then how can I pass it from client to the Map and Reduce? thanks

28. in Map/Reduce , could only reduce be restarted?    stackoverflow.com

is it possible to restart only reduce job in map/reduce job? my guess is 'No' but just want to see if someone has other thoughts about it Thanks

29. Is there a way to access number of successful map tasks from a reduce task in an MR job?    stackoverflow.com

In my Hadoop reducers, I need to know how many successful map tasks were executed in the current job. I've come up with the following, which as far as I ...