mapreduce 1 « hadoop « Java Database Q&A

1. How do you use MapReduce/Hadoop? stackoverflow.com

I'm looking for some general information about how other people are using Hadoop or other MapReduce-like technologies. In general, I am curious to whether you are writing MR applications ...

2. Is there a .NET equivalent to Apache Hadoop? stackoverflow.com

So, I've been looking at Hadoop with keen interest, and to be honest I'm fascinated, things don't get much cooler. My only minor issue is I'm a C# developer and ...

3. Large data - storage and query stackoverflow.com

We have a huge data of about 300 million records, which will get updated every 3-6 months.We need to query this data(continously, real time) to get some information.What are the options ...

4. Hadoop Distribution Differences stackoverflow.com

Can somebody outline the various differences between the various Hadoop Distributions available:

Cloudera - http://www.cloudera.com/hadoop
Yahoo - http://developer.yahoo.net/blogs/hadoop/

using the Apache Hadoop distro as a baseline. Is there a ...

5. Error in Hadoop MapReduce stackoverflow.com

When I run a mapreduce program using Hadoop, I get the following error. 10/01/18 10:52:48 INFO mapred.JobClient: Task Id : attempt_201001181020_0002_m_000014_0, Status : FAILED java.io.IOException: Task process exit with nonzero status of ...

6. Run Hadoop job without using JobConf stackoverflow.com

I can't find a single example of submitting a Hadoop job that does not use the deprecated JobConf class. JobClient, which hasn't been deprecated, still only supports methods that take ...

7. Hadoop searching words from one file in another file stackoverflow.com

I want to build a hadoop application which can read words from one file and search in another file. If the word exists - it has to write to one output file If ...

8. New to Hadoop and dumbo, how to correctly sequence these operations? stackoverflow.com

Consider the following log file format:

id        v1        v2        v3
1  ...

9. Is there anything like Hadoop in C++? stackoverflow.com

What is the closest thing like Hadoop, but in C++? In particular, I want to do distributed computing using MapReduce. Thanks!

10. Finding matching lines with Hadoop/MapReduce stackoverflow.com

I am playing around with Hadoop and have set up a two node cluster on Ubuntu. The WordCount example runs just fine. Now I'd like to write my own MapReduce program to ...

11. Computational Linguistics project idea using Hadoop MapReduce stackoverflow.com

I need to do a project on Computational Linguistics course. Is there any interesting "linguistic" problem which is data intensive enough to work on using Hadoop map reduce. Solution or algorithm ...

12. Project Idea with Hadoop MapReduce stackoverflow.com

I learnt Hadoop a few months back and managed to do a very introductory programming project on it. I want to do a small - medium sized project or series of ...

13. Hadoop 0.2: How to read outputs from TextOutputFormat? stackoverflow.com

My reducer class produces outputs with TextOutputFormat (the default OutputFormat given by Job). I like to consume this outputs after the MapReduce job complete to aggregate the outputs. In addition to ...

14. Hadoop: Iterative MapReduce Performance stackoverflow.com

Is it correct to say that the parallel computation with iterative MapReduce can be justified mainly when the training data size is too large for the non-parallel computation for the same ...

15. Where do I start with distributed computing? stackoverflow.com

I'm interested in learning techniques for distributed computing. As a Java developer, I'm probably willing to start with Hadoop. Could you please recommend some books/tutorials/articles to begin with?

16. Hadoop : Code shipped from master to slave stackoverflow.com

I launched a hadoop cluster and submitted a job to the master. The jar file is only contained in the master. Does hadoop ship the jar to all the slave machines ...

17. How does Hadoop perform input splits? stackoverflow.com

This is a conceptual question involving Hadoop/HDFS. Lets say you have a file containing 1 billion lines. And for the sake of simplicity, lets consider that each line is of the ...

18. Debugging hadoop applications stackoverflow.com

I tried printing out values using System.out.println(), but they won't appear on the console. How do i print out the values in a map/reduce application for debugging purposes using Hadoop? Thanks, Deepak.

19. Hadoop/MapReduce: Reading and writing classes generated from DDL stackoverflow.com

Can someone walk me though the basic work-flow of reading and writing data with classes generated from DDL? I have defined some struct-like records using DDL. For example:

  class Customer {
 ...

20. Global variables in hadoop stackoverflow.com

My program follows a iterative map/reduce approach. And it needs to stop if certain conditions are met. Is there anyway i can set a global variable that can be distributed across ...

21. javax.security.auth.login.LoginException: Login failed stackoverflow.com

I'm trying to run a hadoop job (version 18.3) on my windows machine but I get the following error:

Caused by: javax.security.auth.login.LoginException: Login failed: CreateProcess: bash -c groups error=2
    ...

22. Is there a way to configure timeout for speculative execution in Hadoop? stackoverflow.com

I have hadoop job with tasks that are expected to run for significant length of fime (few minues). However hadoop starts speculative execution too soon. I do not want to turn ...

23. Map Reduce: ChainMapper and ChainReducer stackoverflow.com

I need to split my Map Reduce jar file in two jobs in order to get two different output file, one from each reducers of the two jobs. I mean that the ...

24. Tools for optimizing scalability of an Hadoop application? stackoverflow.com

I'm working with a team of mine on a small application that takes a lot of input (logfiles of a day) and produces useful output after several (now 4, in the ...

25. Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout) stackoverflow.com

I want to debug a mapreduce script, and without going into much trouble tried to put some print statements in my program. But I cant seem to find them in any ...

26. Hadoop in windows : file not found exception stackoverflow.com

I'm using hadoop in windows and i've configured everything good (installing cygwin, passwordless ssh etc..) I've compiled the wordcount program in WC.jar and tried to run. Its running perfectly in standalone ...

27. What is the computational complexity of the MapReduce overhead stackoverflow.com

Given that the complexity of the map and reduce tasks are O(map)=f(n) and O(reduce)=g(n) has anybody taken the time to write down how the Map/Reduce intrinsic operations (sorting, shuffling, sending data, ...

28. mapreduce distance calculation in hadoop stackoverflow.com

Is there a distance calculation implementation using hadoop map/reduce. I am trying to calculate a distance between a given set of points. Looking for any resources .. //edited ............ This is a very intelligent ...

29. custom word count using hadoop stackoverflow.com

I'm a beginer in hadoop. I've understood the WordCount program. Now I have a problem. I dont want the output of all the words.. - Words_I_Want.txt - hello echo raj - Text.txt - hello eveyone. I ...

30. Running a standalone Hadoop application on multiple CPU cores stackoverflow.com

My team built a Java application using the Hadoop libraries to transform a bunch of input files into useful output. Given the current load a single multicore server will do fine for ...

31. Why does DistributedCache mangle my file names stackoverflow.com

I have a weird problem, DistributedCache appears to change the names of my files, it uses the original name as the parent folder and adds the file as a child. i.e. ...

32. manupulating iterator in mapreduce stackoverflow.com

I was trying to find the sum of any given points using hadoop, but my problem is on getting all values from a given key in a single reducer. It is ...

33. MultipleOutputFormat in hadoop stackoverflow.com

I'm a newbie in Hadoop. I'm trying out the Wordcount program. Now to try out multiple output files, i use MultipleOutputFormat. this link helped me in doing it.

34. Getting started with MapReduce/Hadoop stackoverflow.com

Lately, i have reading a lot about MapReduce/Hadoop and think this is where industry is currently moving to. I want to start learning MapReduce/Hadoop and i thought the best way ...

35. Hadoop Data Persistance in which format? stackoverflow.com

I have some experience with Lucene, I'm trying to understand how the data is actually stored in slave server in Hadoop framework?
Do we create an index in Slave Server with set ...

36. Efficient set operations in mapreduce stackoverflow.com

I have inherited a mapreduce codebase which mainly calculates the number of unique user IDs seen over time for different ads. To me it doesn't look like it is being done ...

37. Is "Adopting MapReduce model" = Universal answer to scalability? stackoverflow.com

I have been trying to understand the MapReduce concept and apply it to my current situation. What is my situation? Well, I have an ETL tool here, in which data transformation ...

38. Hadoop map/reduce chaining stackoverflow.com

I want to chain 2 Map/Reduce jobs. I am trying to use JobControl to achieve the same. My problem is - JobControl needs org.apache.hadoop.mapred.jobcontrol.Job which in turn needs org.apache.hadoop.mapred.JobConf which is deprecated. ...

39. Encoding image into Jpeg2000 using Distributed Computing like Hadoop stackoverflow.com

Just wondering if anybody has done/aware about encoding/compressing large image into JPEG2000 format using Hadoop ? There is also this http://code.google.com/p/matsu-project/ which uses map reduce to process the image. Image size ...

40. Expected consumption of open file descriptors in Hadoop 0.21.0 stackoverflow.com

Given Hadoop 0.21.0, what assumptions does the framework make regarding the number of open file descriptors relative to each individual map and reduce operation? Specifically, what suboperations cause Hadoop ...

41. Using Hadoop to "bucket" data out with a single run stackoverflow.com

Is it possible to use one Hadoop job run to output data to different directories based on keys? My use case is server access logs. Say I have them all together, ...

42. Hadoop MapReduce InputFormat Deprecated? stackoverflow.com

I need to implement a custom (service) input source for a Hadoop MapReduce app. I google'd and SO'd and found that one way to proceed is to implement a custom InputFormat. ...

43. Implementation of an ArrayWritable for a custom Hadoop type stackoverflow.com

How do I define an ArrayWritable for a custom Hadoop type ? I am trying to implement an inverted index in Hadoop, with custom Hadoop types to store the data I have ...

44. Hidden features of Hadoop MapReduce stackoverflow.com

What are the hidden features of Hadoop MapReduce that every developer should be aware of? One hidden feature per answer, please.

45. how to perform ETL in map/reduce stackoverflow.com

how do we design mapper/reducer if I have to transform a text file line-by-line into another text file. I wrote a simple map/reduce programs which did a small transformation but the requirement ...

46. How can you create a file inside a hadoop map-reduce job? stackoverflow.com

I searched the web, but all I found was a site that claimed that it could be done. It didn't say how.

47. Make use of the relation name/table name/file name in Hadoop's MapReduce stackoverflow.com

Is there a way to use the relation name in MapReduce's Map and Reduce? I am trying to do Set difference using Hadoop's MapReduce. Input: 2 files R and S containing list ...

48. Mapfile as a input to a MapReduce job stackoverflow.com

I recently started to use Hadoop and I have a problem while using a Mapfile as a input to a MapReduce job. The following working code, writes a simple MapFile called "TestMap" ...

49. How to handle a datanode that dies during map/reduce stackoverflow.com

What happens when the datanode the map/reduce is using goes down? Shouldnt the job be redirected to another datanode? How should my code handle this exceptional condition?

50. Hadoop MapReduce throughput question stackoverflow.com

I am interesting - what can be considered to be a good throughput for the hadoop lightweight text data processing per node?
To be more specific I would ask: Let say I ...

51. How to start learning Hadoop and Mapreduce? stackoverflow.com

How to start learning Hadoop and Mapreduce? Is there any tutorial on hardware requirement and development requirement setting? I am planning to use C++ and Java. Many thanks.

52. Objects from memory as input for Hadoop/MapReduce? stackoverflow.com

I am working on the parallelization an algorithm, which roughly does the following:

Read several text documents with a total of 10k words.
Create an objects for every word in the text corpus.
Create ...

53. Map reduce to compute SVD (Singular value decomposition) stackoverflow.com

Is it possible to parallelize SVD computing, using for example Hadoop's MAP REDUCE? Could you provide a simple example of it??

54. MapReduce recommendation stackoverflow.com

I've heard of Hadoop, but what else can I use to start in this topic...

what other API are there?
In general what is it needed to start programming here?
what do you recommend to ...

55. Hadoop Map-Reduce Code fails to pick driver files libcuddpp.so stackoverflow.com

Greetings to all, Today i came across a strange problem about non-root users in Linux ( CentOS ). I am able to compile & run a Java Program through below commands properly :

[root@cuda1 ...

56. Distributed Profiler for hadoop / mapreduce stackoverflow.com

I am looking to work on hadoop open source implementation and I was wondering if there is a distributed profiler for hadoop? In case, could someone point me to any links ...

57. Implementing a Tree Writable class stackoverflow.com

I would like to implement a TreeWritable class to represent a Tree structure. I have tried the following implementation but I'm getting a mapred.MapTask: Record too large for in-memory buffer error. How should ...

58. How to pass agrs to mapreduce program stackoverflow.com

I have to pass 3rd agrs to mapreduce program.. I have to read file given by user in mapreduce program.

59. Static object in map/reduce stackoverflow.com

I was trying to use a static object in hadoop. This object is both used in map and reduce. My program is :

read 100000 lines, thus 100000 maps.
for each mapper, a static attribute ...

60. All three constructors of org.apache.hadoop.mapreduce.Job are deprecated, what is the best way to construct a Job class? stackoverflow.com

All three constructors of org.apache.hadoop.mapreduce.Job are deprecated, is there a way to construct a Job class the non-deprecated way? Thanks.

61. hadoop chain map/reduce stackoverflow.com

I have chained 2 mappers followed by 1 reducer. Is it possible to write the intermediate outputs (o/p of each mapper in the chain) to HDFS? I tried setting the OutputPath ...

62. how to import the package org.apache.hadoop.mapreduce.lib.chain in a hadoop 0.20.2 project? stackoverflow.com

I'm trying to chain maps and reduces phases in one job. The problem is that I'm running under hadoop 0.20.2 and the package org.apache.hadoop.mapred.lib.Chain seems to be deprecated and replaced by ...

63. Linear filter (FIR) in Hadoop (Hadoop in Action exercise) stackoverflow.com

Exercise 4, Chapter 4 in Hadoop in Action is about implementing a linear filter computing the moving average of a time series. That is, given N and a series of timestamped ...

64. How can I write my own Hadoop scheduler? stackoverflow.com

I've been studying hadoop's scheduler mechanism recently. Using 0.20.2(fair&capacity included) Have read some papers, LATE\Deadline Scheduler... Has anyone tried? or is there a guide? thx anyway

65. project idea for hadoop stackoverflow.com

HI Im 3rd year of college student major in software engineering and had few experiences on HADOOP.i looking for a idea of small to medium size project with hadoop.i want to do ...

66. Hadoop spiled records stackoverflow.com

I couldn't find any documentation on how hadoop handles splilled records. Is there a link that can be found online. Thanks for your time.

67. Error in using one MapReduce's output as another MapReduce's input stackoverflow.com

I have two Map/Reduce classes, named MyMappper1/MyReducer1 and MyMapper2/MyReducer2, and want to use the output of MyReducer1 as the input of MyMapper2, by setting the input path of job2 to the ...

68. Hadoop and MapReduce stackoverflow.com

I am new to HDFS and MapReduce and trying to calculate survey statistics. Input file is in this format: Age Points Sex Category - all 4 of them are numbers. Is ...

69. How to re-run whole map/reduce in hadoop before job completion? stackoverflow.com

I using Hadoop Map/Reduce using Java Suppose, I have completed a whole map/reduce job. Is there any way I could repeat the whole map/reduce part only, without ending the job. I mean, ...

70. Hadoop pipes and new mapred package stackoverflow.com

Is there any work going on to port Hadoop pipes from mapred to mapreduce package? Thanks, Meg

71. How scalable is MapReduce in the original functional languages? stackoverflow.com

The Map-Reduce programming model stems from the map and reduce functions which are present in functional languages like Lisp and Scheme dating back many many years. I remember from university (early 90's) ...

72. How do I write a Hadoop map reduce job without using deprecated classes? stackoverflow.com

I know it's my OCD, but I can't stand to have a deprecated reference in my code. That said, the Hadoop tutorials, including the "The Definitive Guide" book, uses only deprecated classes ...

73. mapreduce count example stackoverflow.com

My question is about mapreduce programming in java. Suppose I have the WordCount.java example, a standard mapreduce program. I want the map function to collect some information, and return to the ...

74. Common examples/code of Hadoop in practice stackoverflow.com

Everywhere I go to learn about Hadoop I see the wordcount example. I want to look at some more code that has been written to solve some other ...

75. running multiple map reduce in hadoop pipes stackoverflow.com

I'm new to hadoop pipes. Can anyone tell me how to run two map reduce together in a single job (program) in hadoop pipes? My problem is that i want to ...

76. Oozie running own MapReduce workflow issue stackoverflow.com

Not sure if anyone has run into this issue. I am trying to use oozie for running a simple MapReduce job that searches for a string value in HDFS location and ...

77. Hadoop 'grep' example stackoverflow.com

In Hadoop 'grep' example (that comes with the Hadoop package) what is the group parameter.Can you give me an example for that.

78. How do I use the MultipleTextOutputFormat using the new Hadoop API? stackoverflow.com

I would like to write multiple output files. How do I do this using Job instead of JobConf?

79. Problem starting tasktracker in hadoop under windows stackoverflow.com

I am trying to use hadoop under windows and I am running into a problem when I want to start tasktracker. For example:

$bin/start-all.sh

then the logs writes:

2011-06-08 16:32:18,157 ERROR org.apache.hadoop.mapred.TaskTracker: Can not ...

80. JoGL in Hadoop? Hadoop for graphics? stackoverflow.com

After reading this and this paper, I decided I want to implement a distributed volume rendering setup for large datasets on MapReduce as my undergraduate thesis work. ...

81. how to get files of fixed size in map-reduce job output stackoverflow.com

I have a use case where I want to process data and generate output of fixed size , say 1 GB i.e. each map-reduce job output should be 1 Gb. Does anybody ...

82. Permutations with MapReduce stackoverflow.com

Is there a way to generate permutations with MapReduce? input file:

1  title1
2  title2
3  title3

my goal:

1,2  title1,title2
1,3  title1,title3
2,3  title2,title3

83. How could I programmatically get all the job tracker and tasktracker information that is displayed by Hadoop in the web interface? stackoverflow.com

I'm using Cloudera's Hadoop distribution CDH-0.20.2CDH3u0. Is there any way I could the information such as jobtracker status, tasktracker status, counters using a JAVA program running outside of hadoop framework? I tried ...

84. MapReduce Job not showing my print statements on the terminal stackoverflow.com

I am currently trying to figure out when you run a MapReduce job what happens by making some system.out.println() at certain places on the code but know of those print statement ...

85. Doubling each number a number of times as specify by the user stackoverflow.com

I am new to hadoop and I am learning by using few examples. I am currently trying to pass a file with random integers on it. For each and every number ...

86. Converting this code into Hadoop stackoverflow.com

i want to convert the below codes to run in hadoop. Basically what I want to achieve is to runner a mapper a number of times. Assuming the array is my ...

87. Recursive calculations using Mapreduce stackoverflow.com

I am working on map reduce program and was thinking about designing computations of the form where a1,b1 are the values associated with a key

  a1/b1, a1+a2/b1+b2, a1+a2+a3/b1+b2+b3 ...

So at ...

88. A good example in hadoop that needs iteration stackoverflow.com

I am currently implement a parallel-for on hadoop to iterate the mapper a number of times as specify by the user. Can someone help me with a useful example that I ...

89. Reading a large input files(10gb) through java program stackoverflow.com

I am working with a 2 large input files of the order of 5gb each.. It is the output of Hadoop map reduce, but as i am not able to do dependency ...

90. Maximum file size that can be processed using Hadoop in 'pseudo distributed' mode stackoverflow.com

I am processing a file with 7+ million lines (~59 MB) in Ubuntu 11.04 machine with this configuration:

Intel(R) Core(TM)2 Duo CPU     E8135  @ 2.66GHz, 2280 MHz
Memory: ...

91. How do I make an external reference table or database available to a Hadoop MapReduce job? stackoverflow.com

I am analyzing a large amount of files in a Hadoop MapReduce job, with the input files being in .txt format. Both my mapper and my reducer are written in Python. However, ...

92. how to set the mapreduce location in hadoop? stackoverflow.com

I'm new to the Apache hadoop. I install the prerequisite software and configure the every thing and eclipse plugins also done but when i click the new hadoop location it's not ...

93. Parallel reducing with Hadoop mapreduce stackoverflow.com

I'm using Hadoop's MapReduce. I have a a file as an input to the map function, the map function does something (not relevant for the question). I'd like my ...

94. How efficient are opensource computation platform like Hadoop etc.? stackoverflow.com

How efficient are opensource distributed computation frameworks like Hadoop? By efficiency, I mean CPU cycles that can be used for the "actual job" in tasks that are mostly pure computation. In ...

95. Hadoop mapreduce : Driver for chaining mappers within a MapReduce job stackoverflow.com

I have mapreduce job: my code Mapp class: public static class MapClass extends Mapper {

    @Override
    public void map(Text key, Text value, Context ...

96. Accessing .dat file from within a Jar file stackoverflow.com

I am trying to access a data file from a public class, both of which are located within a JAR file. However, when I execute the jar on a Hadoop cluster, ...

97. Problem with -libjars in hadoop stackoverflow.com

I am trying to run MapReduce job on Hadoop but I am facing an error and I am not sure what is going wrong. I have to pas library jars which ...

98. How to create and read directories in Hadoop - Mapreduce Job working directory stackoverflow.com

I want to create a directory inside the working directory of a MapReduce job in Hadoop. For example by using: ...

99. Is Hadoop going to give me more benefits in my case? stackoverflow.com

I'm using Clojure to pull ten XML files hourly, each file is about 10 MB. This script is running on a server machine.
XML files are parsed and stored into RDBMS ...

100. Architecture and Design Document for the next generation MapReduce stackoverflow.com

I would like to know the details (architecture and design documents) about the next generation Apache MapReduce. Where are the sources to get more information about it?