jobs « hadoop « Java Database Q&A

1. How do I use a more recent version of a hadoop/lib jar in my map/reduce jobs? stackoverflow.com

Hadoop currently ships with commons-httpclient-3.0.1.jar in its lib folder. If I have a map/reduce task that requires commons-httpclient-3.1.jar, it does not seem to be sufficient to bundle this jar in the lib ...

2. Chaining multiple MapReduce jobs in Hadoop stackoverflow.com

In many real-life situations where you apply MapReduce, the final algorithms end up being several MapReduce steps. I.e. Map1 , Reduce1 , Map2 , Reduce2 , etc. So you have the output from ...

3. Periodic hadoop jobs running (best practice) stackoverflow.com

Customers able to upload urls in any time to database and application should processes urls as soon as possible. So i need periodic hadoop jobs running or run hadoop job automatically ...

4. How to combine multiple Hadoop MapReduce Jobs into one? stackoverflow.com

I have a massive amount of input data (that's why I use Hadoop) and there are multiple tasks that can be solved with various MapReduce steps of which the first mapper ...

5. Is it possible to pick specific machines to run a particular type of hadoop jobs? stackoverflow.com

As far as I understand hadoop architecture considers all machines to be equal with any task/job being able to run on all and any of the machines in the cluster. Is there ...

6. Variable/looping sequence of jobs stackoverflow.com

I'm considering using hadoop/mapreduce to tackle a project and haven't quite figured out how to set up a job flow consisting of a variable number of levels that should be processed ...

7. Pipeling hadoop map reduce jobs stackoverflow.com

I have five map reduce that I am running each separately. I want to pipeline them all together. So, output of one job goes to next job. Currently, I wrote shell ...

8. running multiple MapReduce jobs in hadoop stackoverflow.com

I want to run a chain of map reduce jobs, so the easiest solution seems to be jobcontroller. say I have two jobs, job1 and job2. and I want to run ...

9. Is Hive QL have same expressive power as writing your own MapReduce Jobs directly on Hadoop? stackoverflow.com

To put in other words, Is there a problem that can be solved by directly defining your map reduce jobs, but for which you cannot form a Hive QL query? If yes, then ...

10. Want to compare two consecutive jobs on Hadoop stackoverflow.com

I want to know if I can compare two consecutive jobs in Hadoop. If not I would appreciate if anyone can tell me how to proceed with that. To be precise, ...

11. How to get names of the currently running hadoop jobs? stackoverflow.com

I need to get the list of job names that currently running. But "hadoop -job list" give me list of jobIDs. Is there a way to get names of the running ...

12. Does IBM General Parallel File System(GPFS) support Map/Reduce jobs? stackoverflow.com

I am studying various distributed file systems. Does IBM General Parallel File System(GPFS) support Map/Reduce jobs on its own? Without using 3rd party software(like Hadoop Map/reduce)? Thanks!

13. Hadoop jobs that read from Cassandra seem to run only on master (slaves completely idle) stackoverflow.com

But when I run the hadoop included wordcount example (dfs version), I see the load getting distributed to all the slaves. What is special about ColumnFamilyInputFormat in Cassandra? Do I need ...

14. Hadoop - sharing files between multiple jobs in a chain stackoverflow.com

I have written a map-reduce application that consists of two map-reduce phases. binary input file -> m1-> r1 -> m2 -> r2 -> text output The input file to my application contains a ...

15. Where do I download all of the necessary classes to write Hadoop MapReduce jobs? stackoverflow.com

I've recently started working with Hadoop and have been learning how to write MapReduce jobs. All over the internet, I can find examples and tutorials for writing MapReduce jobs, but ...

16. Scripting Jobs on Hadoop stackoverflow.com

I have about 10 different MapReduce Job to run with some large data sets. Is there anyway I can script this to run one job after the order and at the ...

17. Hector's batch Mutation vs. using Hadoop jobs to load data into Cassandra? stackoverflow.com

Can someone highlight the pros and cons for Hector's batch Mutation and using Hadoop jobs to load data into Cassandra? I know in Hector you can do something like the following:

mutator.addInsertion(...);
mutator.execute();

And ...

18. Running jobs parallely in hadoop stackoverflow.com

I am new to hadoop. I have set up a 2 node cluster. How to run 2 jobs parallely in hadoop. When i submit jobs, they are running one by one in FIFO ...

19. How can I ensure certain hadoop jobs dont end up running in the same datanode when using Fair Scheduler? stackoverflow.com

When using the nutch crawler, the fetch jobs are created such that URLs from same host end up in a single data node to maintain crawl politeness(1 QPS). However, certain hosts ...

20. hadoop : supporting multiple outputs for Map Reduce jobs stackoverflow.com

Seems like it is supported in Hadoop(reference), but I dont know how to use this. I want to :

a.) Map - Read a huge XML file ...

21. Hadoop: High CPU load on client side after committing jobs stackoverflow.com

I couldn't find an answer to my issue while sifting through some Hadoop guides: I am committing various Hadoop jobs (up to 200) in one go via a shell script on ...

22. viewing hadoop jobs status online on MacOS X stackoverflow.com

documents say for `url : http://localhost:50030/. But when I click it I get 404
All my jobs are running locally using Python streaming I am ...

23. hadoop: optimum number of map/reduce jobs on Quad-core machine stackoverflow.com

I tried to find out over the google but no good reference that I found

- I have a Quad-core Ubuntu box running a map-reduce job.  
- running default ...

24. hadoop streaming jobs fails to report? stackoverflow.com

All jobs were running successfully using hadoop-streaming, but all of a sudden I started to see errors due to one of worker machines

Hadoop job_201110302152_0002 failures on master

Attempt Task  ...

25. Hadoop job scheduling alongside jobs with slow mappers in 0.20.203 stackoverflow.com

I am managing a Hadoop cluster that is shared between a number of users. We frequently run jobs with extremely slow mappers. For example, we might have a 32 ...

26. How are data splits of jobs allocated to nodes? stackoverflow.com

I was interested in modifying the way the input data splits of jobs were allocated to particular nodes. I went through JobInprogress code of hadoop but couldn't get to know how ...

27. Hadoop jobs hanging waiting to be killed stackoverflow.com

I have multiple Hadoop jobs doing different processing. When exception occurs in some of these (custom business exception) it is propagated to the map() method, job is killed right away. However some ...

28. Iterative map reduce jobs. How to take reducer output and feed it to the next stage? stackoverflow.com

Specifically, I am trying to fnd a way to compute the shortest path in a graph using map reduce. The one that I have come up with seems to require multiple ...