stream « hadoop « Java Database Q&A

1. How do I control output files name and content of an Hadoop streaming job? stackoverflow.com

Is there a way to control the output filenames of an Hadoop Streaming job? Specifically I would like my job's output files content and name to be organized by the ket the ...

2. How do I set Priority\Pool on an Hadoop Streaming job? stackoverflow.com

How can I set the Priority\Pool of an Hadoop Streaming job? It's probably a command-line jobconf parameter (e.g -jobconf something=pool.name) but I haven't been able to find any documentation on this online... ...

3. Streaming data and Hadoop? (not Hadoop Streaming) stackoverflow.com

I'd like to analyze a continuous stream of data (accessed over HTTP) using a MapReduce approach, so I've been looking into Apache Hadoop. Unfortunately, it appears that Hadoop expects to start ...

4. Managing dependencies with Hadoop Streaming? stackoverflow.com

had a quick hadoop streaming question.. If I'm using python streaming and I have python packages my mappers/reducers require that aren't installed by default do I need to install those on ...

5. Including jar files in Hadoop streaming using Groovy stackoverflow.com

I love Hadoop streaming for it's ability to quickly pump out quick and dirty one off map reduce jobs. I also love Hroovy for making all my carefully coded java accessible ...

6. Hadoop streaming grep does not work stackoverflow.com

Grep seems not to be working for hadoop streaming For: hadoop jar /usr/local/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar -input /user/root/tmp2/user.data -output /user/root/selected_data -mapper '/bin/grep 1938678460' -reducer 'wc' -jobconf mapred.output.compress=false I get: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1 ...

7. Hadoop Streaming in .NET stackoverflow.com

I am running hadoop in Pseudo-Distributed mode and using hadoop streaming to do my map-reduce operations. But the problem is I keep getting Streaming Job Failed error message. Following is the ...

8. Hadoop Streaming stackoverflow.com

If anyone is using hadoop streaming in .NET please share your experience and resources you've used in building your applications. And if there's any other option to run map-reduce jobs in ...

9. Hadoop Streaming with very large size of stdout! stackoverflow.com

I have two programs for Hadoop streaming.

  mapper (produces <k, v> pair)
  reducer

Of course, pair is emitted to stdout. My question is if v in is very large, ...

10. How to use a binary executable which takes filenames as arguments in hadoop streaming? stackoverflow.com

Say I have a binary executable which takes filenames as arguments, like 'myprog file1 file2', it reads from file1 and writes to file2. The binary executable does not take stdin and ...

11. How can I set block size of output files produced by a Hadoop Streaming job? stackoverflow.com

Seems like this should be simple; I have a set of files on our cluster with the cluster-default block size of 128MB. I have a streaming job that process them, ...

12. Hadoop Streaming Omitting Very Large Records stackoverflow.com

I have a hadoop streaming program where reader.readline() has an OutOfMemoryException if the line passed in is too large (over 20M or so). Is there a way to tell hadoop ...

13. How to read compressed bz2 (bzip2) Wikipedia dumps into stream xml record reader for hadoop map reduce stackoverflow.com

I am working on using Hadoop Map Reduce to do research on the wikipedia data dumps (compressed in bz2 format). Since these dumps are so big (5 T), I can't decompress ...

14. Problem with Hadoop Streaming -file option for Java class files stackoverflow.com

I am struggling with a very basic issue in hadoop streaming in the "-file" option. First I tried the very basic example in streaming: hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-streaming-0.20.203.0.jar -mapper org.apache.hadoop.mapred.lib.IdentityMapper \ -reducer /bin/wc -inputformat KeyValueTextInputFormat -input gutenberg/* ...

15. java.io.IOException: error=2, No such file or directory eroor in Hadoop streaming stackoverflow.com

Please help with the "-file" option issue of hadoop streaming (mentioned in the link below). just to update, I know that the jar is already there, I am trying this after ...

16. Problem in running java class files with hadoop streaming stackoverflow.com

Whenever I am trying to use Java class files as my mapper and/or reducer I am getting the following error: java.io.IOException: Cannot run program "MapperTst.class": java.io.IOException: error=2, No such file or directory I ...

17. Hadoop Streaming Problems stackoverflow.com

I ran into these issues while using Hadoop Streaming. I'm writing code in python 1) Aggregate library package According to the hadoop streaming docs ( http://hadoop.apache.org/common/docs/r0.20.0/streaming.html#Working+with+the+Hadoop+Aggregate+Package+%28the+-reduce+aggregate+option%29 ), there is an inbuilt ...

18. How to use hadoop pig streaming a compiled c program? stackoverflow.com

I have testing the hadoop pig on a small cluster. I have successfully using pig to stream perl, python, shell script and even jars but not c binaries! I just build a simple ...

19. Write to different files using hadoop streaming stackoverflow.com

I'm currently processing about 300 GB of log files on a 10 servers hadoop cluster. My data is being saved in folders named YYMMDD so each day can be accessed quickly. My ...

20. copying to and from hdfs within Hadoop Streaming stackoverflow.com

I asked a similar question to this earlier, but after doing some exploring, I have a better understanding of what's going on, but i'd like to see if other people have ...

21. Backward compatibility of Hadoop Streaming stackoverflow.com

AFAK, Hadoop Streaming only support text input, which means the data is organized by lines. but the mapper code will become messy if we want backward compatibility, supporting different versions of ...

22. Can't read Mahout generated sequence files with hadoop streaming stackoverflow.com

I am trying to stream a sequence file generated by one of the Mahout examples to see its contents:

    hadoop jar hadoop-streaming-0.20.2-cdh3u0.jar \
    -input /tmp/mahout-work-me/20news-bydate/bayes-test-input-output/ ...