mapreduce 2 « hadoop « Java Database Q&A

1. How to print on console during MapReduce job execution in hadoop stackoverflow.com

I want to print each step of my "map" after its execution on the console. Something like System.out.println("Completed Step one"); System.out.println("Completed Step two"); and so on Is there a special command to do that or ...

2. Using MultipleTextOutputFormat to control output filename in MapReduce stackoverflow.com

Hadoop(and Java) neophyte here. I needed some help with using MultipleTextOutputFormat to control the output filename in MapReduce. Currently I am using it this way. And it seems to work ...

3. Best practices for using Oozie for Hadoop stackoverflow.com

I have been using Hadoop quite a while now. After some time I realized I need to chain Hadoop jobs, and have some type of workflow. I decided to use Oozie ...

4. shared variable in map reduce stackoverflow.com

I need a variable that shared between reduce tasks and each of reduce tasks can read and write on it atomically. The reason that I need such a variable is to give ...

5. Possible to override the context.write() method in ReduceContext? stackoverflow.com

Using 0.20.2... Is it possible to override the context.write() method in ReduceContext? I have an entire set of Reducers that I would like to all use a specific function just before ...

6. Hadoop mapreduce programming stackoverflow.com

How do I get the sorted o/p using Hadoop mapreduce programming. Is there any way to get final key-value pair in sorted order. ( either by key or value). Any pointers on this ...

7. Error while running Mapreduce program stackoverflow.com

am getting the following error while Running a Map-reduce program.

The program is to sort the o/p using TotalOrderpartition.

I have 2 node cluster. 
when i run teh program with -D mapred.reduce.tasks=2 its ...

8. Custom MapReduce Input Format - Can't Find Constructor stackoverflow.com

I'm writing a custom InputFormat for Hadoop 0.20.2 and am running into a NoSuchMethodException I can't get rid of. I started with:

public class ConnectionInputFormat extends FileInputFormat<Text, Connection> {

   ...

9. Controlling precision with Hadoop FloatWritable stackoverflow.com

Recently, I worked on a Map Reduce application where my target was to calculate the average values a certain set of values (all long integers). Also, the application demanded writing out ...

10. Is implementing the RawComparator really that much faster? stackoverflow.com

Is implementing the RawComparator that much faster than extending WritableComparator? Looking at Text/LongWritable/etc, and their built-in comparators, it seems that they basically just read in the fields directly from the full ...

11. Extreme amount of overhead in simple MapReduce job stackoverflow.com

I'm experimenting with Hadoop and created a very simply map and reduce job. The input is a 30 line text file, and the output is only 3 lines (it's an excerpt ...

12. Calculating ranking in Map/Reduce stackoverflow.com

I have a simple problem which is hard to solve in SQL and I'm wondering if it can be done in a map-reduce system. I want to produce rankings. Imagine Amazon ...

13. Why is the right number of reduces in Hadoop 0.95 or 1.75? stackoverflow.com

The hadoop documentation states:

The right number of reduces seems to be 0.95 or 1.75 multiplied by ( * mapred.tasktracker.reduce.tasks.maximum). With 0.95 all of the reduces ...

14. How to specify tab as a record separator for hadoop input text file? stackoverflow.com

The input file to my hadoop M/R job is a text file in which the records are separated by tab character '\t' instead of newline '\n'. How can I instruct hadoop ...

15. How to start learning hadoop stackoverflow.com

I am a Web developer. I have experience in Web technologies like JavaScript , Jquery , Php , HTML . I know basic concepts of C. Recently I had taken interest ...

16. What is the best way of reading/writing binary input/output files with MapReduce? stackoverflow.com

In all samples I've seen so far, mapreduce apps take text files as input and write text as output. I'd like my app to read objects from the binary file and write ...

17. How to tell MapReduce how many mappers to use? stackoverflow.com

I am trying to speed optimize MapReduce job. Is there any way I can tell hadoop to use a particular number of mapper/reducer processes? Or, at least, minimal number of mapper processes? In ...

18. How to set hadoop input format to NLineInputFormat? stackoverflow.com

I am trying to limit the number of lines each of the Mappers gets. My code goes like this:

    package com.iathao.mapreduce;

    import java.io.IOException;
    ...

19. How to use the MultipleTextOutputFormat class to rename the default output file to some meaningful names? stackoverflow.com

After the reduce phase in Hadoop, I wanted the output file names to be something meaningful depending on the input key value. However I'm not successful on following the example on ...

20. How to tell MapReduce how many mappers to use at the same time? stackoverflow.com

I am writing an indexing app for MapReduce. I was able to split inputs with NLineInputFormat, and now I've got few hundred mappers in my app. However, only 2/mashine of those are ...

21. Anomaly detection using mapreduce stackoverflow.com

I'm new to Apache Hadoop and i'm really looking forward to explore more features of it. After the basic wordcount example i wanted to up the ante a little bit. So ...

22. How to pull data in the Map/Reduce functions? stackoverflow.com

According to the Hadoop : The Definitive Guide.

The new API supports both a “push�? and a “pull�? style of iteration. In both APIs, key-value record pairs are ...

23. Map reduce value list order problem stackoverflow.com

As we know Hadoop groups values with per key and sends them to same reduce task. Suppose I have next lines in file on hdfs. line1 line2 line3 .... linen In map task I print filename and line. In ...

24. How to set setMaxMapTaskFailuresPercent in hadoop's new api? stackoverflow.com

Before, you could set max failures percent by using

JobConf.setMaxMapTaskFailuresPercent(int)

now, that's obsolete.

 job.getConfiguration().set("mapred.max.map.failures.percent", "100");

doesn't seem to work as well. What is the proper way of doing this in new hadoop api?

25. Is it better to use the mapred or the mapreduce package to create a Hadoop Job? stackoverflow.com

To create MapReduce jobs you can either use the old org.apache.hadoop.mapred package or the newer org.apache.hadoop.mapreduce package for Mappers and Reducers, Jobs ... The first one had been marked as deprecated ...

26. Converting a normal java program to map reduce stackoverflow.com

I want to write a Java wrapper which will convert 'compatible programs' into map reduce form to be executed by the Hadoop framework. I am aware that my question is quite ambiguous.But ...

27. Hadoop : java.io.IOException: No valid local directories in property: mapred.local.dir stackoverflow.com

when I run the hadoop job it fails with the following stacktrace:

11/10/06 13:12:49 INFO mapred.FileInputFormat: Total input paths to process : 1
11/10/06 13:12:49 INFO mapred.JobClient: Cleaning up the staging ...

28. Hadoop node & core allocation strategy stackoverflow.com

I have a cluster with 50 nodes and each node has 8 cores for computation. If I have job to which I'm planning to impose 200 reducers, what would be good computational ...

29. hadoop-streaming : writing output to different files stackoverflow.com

Here is the scenario

           Reducer1  
         /  
Mapper - ...

30. Get a org.apache.hadoop.mapreduce.Job from a job already completed on the JobTracker stackoverflow.com

I'm using org.apache.hadoop.mapreduce.Job to create/submit/run a MR Job (Cloudera3, 20.2), and after it completes, in a separate application, I'm trying to get the Job to grab the counters to do some ...

31. Difference between Memcached and Hadoop? stackoverflow.com

What is the basic difference between Memcached and Hadoop? Microsoft seems to do memcached with the Windows Server AppFabric. I know memcached is a giant key value hashing function using multiple servers. ...

32. How we can force many mappers read one specific file (same data) in hadoop? stackoverflow.com

i want to write a program that many mappers read one 1 file that is a graph and all do processing on that graph, that file is about 14 kb if i ...

33. Hadoop : JPS can not find Java installed stackoverflow.com

my configurations are

hduser@worker1:/usr/local/hadoop/conf$ jps
The program 'jps' can be found in the following packages:
 * openjdk-6-jdk
 * openjdk-7-jdk
Ask your administrator to install one of them

I have java installed though

hduser@worker1:/usr/local/hadoop/conf$ java -version
java version ...

34. Custom partitioner example stackoverflow.com

I am trying to write a new Hadoop job for input data that is somewhat skewed. An analogy for this would be the word count example in Hadoop tutorial except lets ...

35. migrating computation to the cloud stackoverflow.com

Is there any automatic tool that I can transform legacy uniprocessor programs to the cloud, meaning that the target program is ready to execute in the cloud (e.g. programs written for ...

36. Serialization using ArrayWritable seems to work in a funny way stackoverflow.com

I was working with ArrayWritable, at some point I needed to check how hadoop serialize the ArrayWritable, this is what I got by setting job.setNumReduceTasks(0):

0    IntArrayWritable@10f11b8
3    IntArrayWritable@544ec1
6    IntArrayWritable@fe748f
8 ...

37. hadoop mapreduce error stackoverflow.com

hello
I am facing some ERROR when i am runnig hadoop on map reduce environment in eclipse error, An internal error occurred during:"Refresh DFS Children".

org.eclipse.team.internal.ccvs.ssh2.CVSSSH2Plugin.get PreferenceStore()Lorg /eclipse/jface/preference/IPreferenceStore.

...

38. How to share global sequential number generator in Hadoop? stackoverflow.com

Now I am using Hadoop to process the data that will finally be loaded into the same table. I need to a shared sequential number generator to generate id for each ...

39. Specifying text/string types as value for Hadoop counters stackoverflow.com

The current methods to set/increment hadoop counters only take in long values. eg: increment(long incr) and setValue(long value) are two methods I pulled out from the Hadoop Javadocs. My requirement is to store ...

40. Hadoop CouchDB Elastic Search stackoverflow.com

I have already installed CouchDB (ver 1.1.0), Elastic Search (0.17.6) on my Fedora. I want now to install Hadoop Map/reduce (http://hadoop.apache.org/mapreduce/) and Hadoop DFS (http://hadoop.apache.org/hdfs/) on this machine but I wonder ...

41. Add a progress tracking mechanism to hadoop MapReduce Cleanup stackoverflow.com

Let's say I'm using cleanup() functions in Hadoop MapReduce. How would I add a progress tracking mechanism inside it, let's say in percentage complete, to display it in console?

42. How to store the map-reducede out-put in different node ??? stackoverflow.com

I want to store catagorized data in different node in hadoop . ex:

Node - 1 >> Animal.txt
Node - 2 >> Sports.txt
Node - 3 >> Life.txt
.
.
.
Node - n >> nnnnn.txt

Is there a way ...

43. What is the loading order of the configuration files in hadoop? stackoverflow.com

I use the following program to rename a directory, but I got the exception, which seems that it only assumes that I am using the local file system. Actually, in my ...

44. How to customize Writable class in Hadoop? stackoverflow.com

I'm trying to implement Writable class, but i have no idea on how to implement a writable class if in my class there is nested object, such as list, etc. Could ...

45. Not able to stop Hadoop IPC service stackoverflow.com

I am using Hadoop IPC to create a sequencial number generating service, but not able to stop the server when the program exits. Could anybody help me?

import java.io.File;
import java.io.IOException;
import java.net.InetAddress;
import java.util.HashMap;
import ...

46. Calculate Means and Standard Deviation by columns in Hadoop stackoverflow.com

I want to calculate means and standard deviation by columns in Hadoop. I simple adopt single pass Naïve algorithm to MapReduce. I tested it on multivariate data sets 455000x90 and 650000x120 and got ...

47. The subsequent Job is not able to read the output of the previous job immediately stackoverflow.com

I have two sequencial Job1, and Job2. The output of Job1 is written into HDFS. Job2 will download the output of Job1 to the local file system. However, I found that ...

48. Re-run the Hadoop job, will the partitioned mapoutput still go to the same Reducers? stackoverflow.com

In hadoop, suppose the number of nodes is fixed (no server crash during the run), if i use the same partitioner (e.g., hash partitioning on the key of map output) to ...

49. Trying to understand Map Reduce stackoverflow.com

I have a CDH3 psuedo-distributed cluster on my Ubuntu Server vm. It seems to be working fine as far as I can tell. So far, I have tried to run some ...

50. How does splitting and choosing hosts for splits is done in Mumak? stackoverflow.com

I see that InputFormat, FileSplit and FileInputFormat files are involved in creating the splits of the input data for jobs in hadoop. I am interested in knowing how this splitting and choosing hosts ...

51. Is there any way for a fully distributed Hadoop/MapReduce program to have its individual nodes be reading local input files? stackoverflow.com

I am trying to set up a fully-distributed Hadoop/MapReduce instance where each node will be running a series of C++ Hadoop Streaming task on some input. However I don't want to ...

52. create trace files using rumen stackoverflow.com

I need to create trace files with rumen. How do I use rumen to create trace files from Job logs? Which type of job logs can be used and how do I generate ...

53. Exploring Hadoop code stackoverflow.com

I wanted to know about Hadoop more than a black box. I wanted to explore the Hadoop code itself. How can I download the bundle not from the trunk and where ...

54. how to run the mapreduce matrix multiplication example stackoverflow.com

Hi I am in learning process of mapreduce and haddoop. I want to run the example of matrix multiplitaction presented here along with its code: http://www.norstad.org/matrix-multiply/index.html I know its a strange ...

55. Passing arguments to Hadoop mappers stackoverflow.com

I'm using new Hadoop API and looking for a way to pass some parameters (few strings) to mappers.
How can I do that? This solutions works for old API: http://www.hongliangjie.com/2011/01/16/passing-parameters-and-arguments-to-mapper-and-reducer-in-hadoop/

56. Apriori and association rules with Hadoop stackoverflow.com

Is it doable to create an Apriori app using map-reduce? I am starting out but it's not clear how to create the next Candidate sets based on a previous run. ...

57. Hadoop shuffle uses which protocol? stackoverflow.com

During the shuffle stage of Hadoop data the mapped data is transferred across nodes of the clusters according to the partitions for the reducer. What protocol does Hadoop use for performing the shuffle ...

58. Comparing two large datasets using a MapReduce programming model stackoverflow.com

Let's say I have two fairly large data sets - the first is called "Base" and it contains 200 million tab delimited rows and the second is call "MatchSet" which has ...

59. Does hadoop really handle datanode failure? stackoverflow.com

In our hadoop setup, when a datanode crashes (or) hadoop doesn't respond on the datanode, reduce task fails unable to read from the failed node(exception below). I thought hadoop handles data ...

60. Can inputs and outputs of hadoop be other than files? stackoverflow.com

i am trying to write a hadoop mapreduce program in java. For which the input is an array and output is also an array. But till now i have only seen ...

61. mapreduce matrix multiplication with hadoop stackoverflow.com

I am trying to run the matrix multiplication example mentioned(with source code) on the following link: http://www.norstad.org/matrix-multiply/index.html I have hadoop setup in pseudodistributed mode and I configured it using this tutorial: