hadoop 1 « hadoop « Java Database Q&A





1. Experience with Hadoop?    stackoverflow.com

Have any of you tried Hadoop? Can it be used without the distributed filesystem that goes with it, in a Share-nothing architecture? Would that make sense? I'm also interested into any performance ...

2. Hadoop examples?    stackoverflow.com

I'm examining Hadoop as a possible tool with which to do some log analysis. I want to analyze several kinds of statistics in one run. Each line of my ...

3. java.io.IOException: Job failed! when running a sample app on my osx with hadoop-0.19.1    stackoverflow.com

bash-3.2$ echo $JAVA_HOME
/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
bash-3.2$ bin/hadoop dfs -copyFromLocal conf /user/yokkom/input2
bash-3.2$ bin/hadoop jar hadoop-*-examples.jar grep input2 output 'dfs[a-z.]+'
09/04/17 10:09:32 INFO mapred.FileInputFormat: Total input paths to process : 10
09/04/17 10:09:33 INFO mapred.JobClient: Running job: job_200904171309_0001
java.io.IOException: ...

4. hadoop behind the scenes    stackoverflow.com

Can someone explain what is hadoop in terms of the ideas behind the software ? What makes it so popular and/or powerful ?

5. getting data in and out of hadoop    stackoverflow.com

I need a system to analyze large log files. A friend directed me to hadoop the other day and it seems perfect for my needs. My question revolves around getting ...

6. Java Generics & Hadoop: how to get a class variable    stackoverflow.com

I'm a .NET programmer doing some Hadoop work in Java and I'm kind of lost here. In Hadoop I am trying to setup a Map-Reduce job where the output key of ...

7. Dealing with Gigabytes of Data    stackoverflow.com

I am going to start on with a new project. I need to deal with hundred gigs of data in a .NET application. It is very early stage now to give ...

8. Distributing Video on a LAN to alternate Locations - Can the browser detect this?    stackoverflow.com

I'm the administrator for a company intranet and I'd like to start producing videos. However, we have a very small bandwidth tunnel between our locations, and I'd like to avoid hogging ...

9. Look up values in a BDB for several files in parallel    stackoverflow.com

What is the most efficient way to look up values in a BDB for several files in parallel? If I had a Perl script which did this for one file at ...





10. What is Hadoop?    stackoverflow.com

I want to know what Hadoop is ? I have gone through Google and Wikipedia but I am not clear of what actually Hadoop is and what is the goal of ...

11. Is Hadoop right for running my simulations?    stackoverflow.com

have written a stochastic simulation in Java, which loads data from a few CSV files on disk (totaling about 100MB) and writes results to another output file (not much data, just ...

12. Can Hadoop be restricted to spare CPU cycles?    stackoverflow.com

Is it possible to run Hadoop so that it only uses spare CPU cycles? I.e. would it be feasible to install Hadoop on peoples work machines so that number crunching ...

13. How to parallelize execution on remote systems    stackoverflow.com

What's a good method for assigning work to a set of remote machines? Consider an example where the task is very CPU and RAM intensive, but doesn't actually process a ...

14. hadoop- determine if a file is being written to    stackoverflow.com

Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process ...

15. Which Hadoop product is more appropriate for a quick query on a large data set?    stackoverflow.com

I am researching Hadoop to see which of its products suits our need for quick queries against large data sets (billions of records per set) The queries will be performed against chip ...

16. Converting word docs to pdf using Hadoop    stackoverflow.com

Say if I want to convert 1000s of word files to pdf then would using Hadoop to approach this problem make sense? Would using Hadoop have any advantage over simply using ...





17. Question on hadoop "java.lang.RuntimeException: java.lang.ClassNotFoundException: "    stackoverflow.com

Here's my source code

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class PageRank {

public static final String MAGIC_STRING ...

18. Question about using C# to talk to Hadoop FileSystem    stackoverflow.com

Currently my application uses C# with MONO on Linux to communicate to local file systems (e.g. ext2, ext3). The basic operations are open a file, write/read from file and close/delete the ...

19. Remote java program execution using ftp, very large dataset on remote machine - program to data vs data to program    stackoverflow.com

I am developing a java based application; its pertinent requirements are listed below

  • Large datasets exist on several machines on network. my program needs to (remotely) execute a java program to process ...

20. Very basic question about Hadoop and compressed input files    stackoverflow.com

I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file ...

21. Dynamic Nodes in Hadoop    stackoverflow.com

Is it possible to add new nodes to Hadoop after it is started? I know that you can remove nodes (as that the master tends to keep tabs on the node ...

22. Generating Multiple Output files with Hadoop 0.20+    stackoverflow.com

I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based ...

23. Any tested Frameworks/Solutions similar to Apache Hadoop?    stackoverflow.com

I am interested in the Apache Hadoop project, but i would like to know if any other tested (please mind the 'tested') projects/frameworks are out there. Appreciate any information/links to projects similar ...

24. Hadoop: Disadvantages of using just 2 machines?    stackoverflow.com

I want to do log parsing of huge amounts of data and gather analytic information. However all the data comes from external sources and I have only 2 machines to store ...

25. Running multiple hadoop instances on same machine    stackoverflow.com

I wish to run a second instance of Hadoop on a machine which already has an instance of Hadoop running. After untar'ing hadoop distribution, some config files need to changed from ...

26. What should be hadoop.tmp.dir?    stackoverflow.com

Hadoop has configuration parameter hadoop.tmp.dir which, as per documentation, is "A base for other temporary directories." I presume, this path refers to local file system. I set this value to /mnt/hadoop-tmp/hadoop-${user.name}. ...

27. Matching large datasets using Hadoop?    stackoverflow.com

I would love to get a sense if haddop is right tool for the problem I have. I'm building offline process (once a month or one a quarter) that matches 2 ...

28. Splitting large XML files into manageble sections for Hadoop    stackoverflow.com

Is there a input class to deal with [multiple] large XML files based on their tree structure in Hadoop? I have a set of XML files that are of the same ...

29. Hadoop - job statistics    stackoverflow.com

I used hadoop to run map-reduce applications on our cluster. The jobs take around 10 hours to complete daily. I want to know the time taken for each job, and the ...

30. Free data warehouse - Infobright, Hadoop/Hive or what?    stackoverflow.com

I need to store large amount of small data objects (millions of rows per month). Once they're saved they wont change. I need to :

  • store them securely
  • use them to analysis (mostly ...

31. what is a data serialization system?    stackoverflow.com

according to Apache AVRO project, "Avro is a serialization system". By saying data serialization system, does it mean that avro is a product or api? also, I am not quit sure about ...

32. Better to build or buy a compute grid platform?    stackoverflow.com

I am looking to do some quite processor-intensive brute force processing for string matching. I have run my prototype in a multi-threaded environment and compared the performance to an implementation ...

33. Tracking Hadoop job status via web interface? (Exposing Hadoop to internal clients in the company)    stackoverflow.com

I want to develop a website that will allow analysts within the company to run Hadoop jobs (choose from a set of defined jobs) and see their job's status\progress. Is there an ...

34. How to learn using Hadoop    stackoverflow.com

I want to learn hadoop. However, I don't have access to a cluster now. Is it possible for me to learn it and use it for writing programs and learn it ...

35. Running Hadoop example in psuedo-distributed mode on vm    stackoverflow.com

I have set-up Hadoop on a OpenSuse 11.2 VM using Virtualbox.I have made the prerequisite configs. I ran this example in the Standalone mode successfully. But in psuedo-distributed mode I get ...

36. Free Large datasets to experiment with Hadoop    stackoverflow.com

Do you know any large datasets to experiment with Hadoop which is free/low cost? Any pointers/links related is appreciated. Prefernce:

  • Atleast one GB of data.
  • Production log data of webserver.
Few of them which I found ...

37. Classnotfound exception while running hadoop    stackoverflow.com

I am new to hadoop. I have a file Wordcount.java which refers hadoop.jar and stanford-parser.jar I am running the following commnad

javac -classpath .:hadoop-0.20.1-core.jar:stanford-parser.jar -d ep WordCount.java 

jar cvf ep.jar -C ep .

bin/hadoop ...

38. Efficient way to store a graph for calculation in Hadoop    stackoverflow.com

I am currently trying to perform calculations like clustering coefficient on huge graphs with the help of Hadoop. Therefore I need an efficient way to store the graph in a way ...

39. Which Hadoop API version should I use?    stackoverflow.com

In the latest Hadoop Studio the 0.18 API of Hadoop is called "Stable" and the 0.20 API of Hadoop is called "Unstable". The distribution that comes from Yahoo is a ...

40. getting close to real-time with hadoop    stackoverflow.com

I need some good references for using Hadoop for real-time systems like searching with little response time. I know hadoop has its overhead of hdfs, but whats the best way of ...

41. Repository organization for Hadoop project    stackoverflow.com

I am starting on a new Hadoop project that will have multiple hadoop jobs(and hence multiple jar files). Using mercurial for source control, I was wondering what would be optimal way ...

42. Trying to find org.apache.hadoop.io.LongWritable    stackoverflow.com

I'm trying to create a simple project with hadoop. I am new to IntelliJ and am trying to set the classpath to org.apache.hadoop.io. But what jar has this class?

43. Hadoop development environment, what yours looks like?    stackoverflow.com


I would like to know what yours Hadoop development environment looks like?
Do you deploy jars to test cluster, or run jars in local mode?
What IDE do you use and what plugins ...

44. How to merge 2 bzip2'ed files?    stackoverflow.com

I want to merge 2 bzip2'ed files. I tried appending one to another: cat file1.bzip2 file2.bzip2 > out.bzip2 which seems to work (this file decompressed correctly), but I want to use ...

45. Making graphs of hadoop runs    stackoverflow.com

On some websites (like in this PDF : http://sortbenchmark.org/Yahoo2009.pdf) I see very nice graphs that visualize what an Hadoop cluster is doing at what moment. Were these made "manually" (i.e. ...

46. What do you recommend for a Hadoop book?    stackoverflow.com

I've started getting into technology books to read. I want to learn Hadoop, and I find that I enjoy just reading books rather than staring at a computer screen ...

47. hadoop null pointer exception    stackoverflow.com

import java.awt.image.BufferedImage;
import java.awt.image.DataBufferByte;
import java.awt.image.Raster;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import javax.imageio.ImageIO;
import javax.xml.soap.Text;


import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

public class blur {
public static class BlurMapper extends MapReduceBase implements Mapper<Text, BytesWritable, LongWritable, BytesWritable>
{
    OutputCollector<LongWritable, BytesWritable> goutput;

  ...

48. urgent Attention Required-hadoop: BufferedImage and ConvolveFilter-->JHLabs:    stackoverflow.com

sorry to disturb again but i like learning here. i am using JHLabs library on filters for buffered images.on running my code i am getting this exception:L

java.lang.ArrayIndexOutOfBoundsException: 4
    at ...

49. HadoopDb Java Program    stackoverflow.com

first of all thanks for showing interest. I'm Adarsh Sharma presently working on Hadoop Technologies such as Hive, Hadoop, HadoopDB , Hbase etc. I have configured HadoopDB on the Hadoop Cluster of 3 ...

50. How to use custom pool assignment for FairScheduler in Hadoop?    stackoverflow.com

I am trying to take advantage of multiple pools in FairScheduler. But all my jobs are submitted by a single agent process and therefore all belong to same user. I have set ...

51. Hadoop... Text.toString() conversion problems    stackoverflow.com

I'm writing a simple program for enumerating triangles in directed graphs for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab symbol serves ...

52. Hadoop begineers    stackoverflow.com

I'm trying to practice some data mining algorithms over hadoop. Can I do it with HDFS alone or do I need to use the sub-projects like hive/hbase/pig? Thanks, ram.

53. Hadoop job fails when invoked by cron    stackoverflow.com

I have created the following shell script for invoking a hadoop job:

#!/bin/bash
/opt/hadoop/bin/hadoop jar /path/to/job.jar com.do.something <param-1> ... <param-n> &
wait %1
STATUS=$?
if [ $STATUS -eq 0 ]
then    
   ...

54. How to avoid OutOfMemoryException when running Hadoop?    stackoverflow.com

I'm running a Hadoop job over 1,5 TB of data with doing much pattern matching. I have several machines with 16GB RAM each, and I always get OutOfMemoryException on this job ...

55. Hadoop block size issues    stackoverflow.com

I've been tasked with processing multiple terabytes worth of SCM data for my company. I set up a hadoop cluster and have a script to pull data from our SCM servers. ...

56. Why does the Hadoop incompatible namespaceIDs issue happen?    stackoverflow.com

This is a fairly well-documented error and the fix is easy, but does anyone know why Hadoop datanode NamespaceIDs can get screwed up so easily or how Hadoop assigns the NamespaceIDs ...

57. How to convert a Hadoop Path object into a Java File object    stackoverflow.com

Is there a way to change a valid and existing Hadoop Path object into a useful Java File object. Is there a nice way of doing this or do I need ...

58. How does Hadoop's RunJar method distribute class/jar files across nodes?    stackoverflow.com

I'm trying to use JIT compilation in clojure to generate mapper and reducer classes on the fly. However, these classes aren't being recognized by the JobClient (it's the usual ClassNotFoundException.) If I ...

59. What does this Java Syntax mean?    stackoverflow.com

In the code below, what does Iterator<V> and OutputCollector<K, V> mean? Is it a special data type?

public void reduce(K key, 
  Iterator<V> values, 
  OutputCollector<K, V> output, 
  ...

60. hadoop inputFile as a BufferedImage    stackoverflow.com

Sorry for my poor english. i hope you'll understand my problem. I have a question about hadoop developpment. I have to train myself on a simple image processing project using hadoop. All i want ...

61. Hadoop ToolRunner fails with NoClassDefFoundError    stackoverflow.com

I am brand new to Linux, Java, and Hadoop. I have a created a simple MapReduce Driver that implements the Tool interface. But when I try to run the ...

62. How can I run Hadoop run with a Java class?    stackoverflow.com

I am following the book Hadoop: the definitive Guide. I am confused on example 3-1. There is a Java source file, URLCat.java. I use javac to compile it into URLCat.class, then ...

63. Libraries/Tools for Website Parsing    stackoverflow.com

I would like to start working with parsing large numbers of raw HTML pages into semantic data structures. Just interested in the community opinion on various available tools for such a task, ...

64. Hadoop and 3d Rendering of images    stackoverflow.com

I have to make a project Distributed rendering of a 3d image. I can use standard algorithms. The aim is to learn hadoop and not image processing. So can any one ...

65. Idle hadoop master - how to make it do some work?    stackoverflow.com

I have launched a small cluster of two nodes and noticed that the master stays completely idle while the slave does all the work. I was wondering what is the way ...

66. Create a hadoop jar with external dependencies using Gradle    stackoverflow.com

How do I create a hadoop jar that includes all dependencies in the lib folder using Gradle? Basically, similar to what fatjar does.

67. When is it an overkill to use Hadoop?    stackoverflow.com

I have an Oracle database (roughly 1.2 billion records) of data with a web application sitting on top of it that generates queries (generates SQL code and returns counts). Basically you ...

68. How to run a Hadoop program?    stackoverflow.com

I have set up Hadoop on my laptop and ran the example program given in the installation guide successfully. But, I am not able to run a program.

rohit@renaissance1:~/hadoop/ch2$ hadoop ...

69. Problem while executing hadoop code    stackoverflow.com

I just started with Hadoop. I wrote a sample hadoop code as was written in the book. But still, during the time of execution exceptions arise. The snippet of what I ...

70. Read a long string into memory    stackoverflow.com

I am having a very large string, and when I read it in Java, I am getting out of memory error. Actually, I need to read all this string into memory ...

71. Distributed, error-handling, copying of TB's of data    stackoverflow.com

We have a box that has terabytes of data (10-20TB) each day, where each file on the drive is anywhere from megabytes to gigabytes. We want to send all these files to ...

72. Hadoop query regarding setJarByClass method of Job class    stackoverflow.com

In the Hadoop API documentation it's given that setJarByClass public void setJarByClass(Class cls) Set the Jar by finding where a given class came from. What exactly does this explanation ...

73. Ad Hoc Reports Hadoop    stackoverflow.com

Hey guys, I want to allow people to put in simple text search terms, run a pig job(if that's best? it's what I know best) and output the results (the tsv file ...

74. Running Hadoop examples halt in Pseudo-Distributed mode    stackoverflow.com

Every thing run well in Standalone mode and when going to the pseudo-distributed mode, the HDFS works well, I can put files to HDFS and browse it. And I also checked ...

75. How to compile and set up Sizzle, an open source Sawzall implementation for Hadoop, on Mac OS X?    stackoverflow.com

'Sizzle is an open source implementation of the Sawzall programming language designed for interoperation with the Hadoop MapReduce and DFS stack.' https://github.com/anthonyu/Sizzle

76. Read and Write a file in hadoop in pseudo distributed mode    stackoverflow.com

I want to open/create a file and write some data in it in hadoop environment. The distributed file system I am using is hdfs. I want to do it in pseudo ...

77. 1 million sentences to save in DB - removing non-relevant English words    stackoverflow.com

I am trying to train a Naive Bayes classifier with positive/negative words extracting from a sentiment. example: I love this movie :)) I hate when it rains :( ...

78. Apache Hadoop : Can it do "time-varying" input?    stackoverflow.com

I haven't found an answer to this even after a bit of googling. My input files are generated by a process which chunks them out at say, when the file touches ...

79. How to create the hadoop-0.21.0-core.jar using the source code?    stackoverflow.com

How to create the hadoop-0.21.0-core.jar using the source code? I have check out the source code from svn. Now I have three dirs common,hdfs,mapred I want to build the hadoop-0.21.0-core.jar to run a ...

80. EOFException thrown by a Hadoop pipes program    stackoverflow.com

First of all, I am a newbie of Hadoop. I have a small Hadoop pipes program that throws java.io.EOFException. The program takes as input a small text file and uses hadoop.pipes.java.recordreader ...

81. How can I use multiple input files as a input file?    stackoverflow.com

I want to use multiple files (actually 2 files) as a input files. they are having same patterns of data. finally, I wanna get to diff datas from two input files. for example, in a ...

82. how does netezza work? how does it compare to Hadoop?    stackoverflow.com

want to understand if Netezza/Hadoop is the right choice for the below purposes: pull feed files from several online sources of considerable size at times more than a GB. clean, filter, transform and ...

83. Why isn't Hadoop implemented using MPI?    stackoverflow.com

Correct me if I'm wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes. What are the technical reasons for this? I could hazard a few guesses, ...

84. Hadoop job taking input files from multiple directories    stackoverflow.com


I have a situation where I have multiple (100+ of 2-3 MB each) files in compressed gz format present in multiple directories. For Example
A1/B1/C1/part-0000.gz
A2/B2/C2/part-0000.gz
A1/B1/C1/part-0001.gz I have to feed ...

85. Knowledge mining using Hadoop    stackoverflow.com

I want to do a project Hadoop and map reduce and present it as my graduation project. To this, I've given some thought,searched over the internet and came up with the ...

86. Column store on top of hadoop?    stackoverflow.com

Is there a column store similar to Vertica that is built on top of Hadoop.. I am not talking about HBase as it is sparse matrix store and can not get ...

87. Hadoop certification    stackoverflow.com

Has anyone here attended the Cloudera training and certification? How was the certification exam? Anything that would make the exam easy?

88. Does hadoop eclipse-plugin support argument    stackoverflow.com

I downloaded the hadoop eclipe plug-in from this website: https://issues.apache.org/jira/browse/MAPREDUCE-1262 Thus, I can run hadoop program inside eclipe, but I don't know how to use argument in this plugin. For example jar ...

89. Can Hadoop run on Nginx?    stackoverflow.com

Is that possible to run Hadoop on Nginx? if so, is there any reference?

90. root of java installation    stackoverflow.com

am trying to set up apache hadoop in my system. In the procedure page it says "edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your ...

91. How do I view standard out in hadoop?    stackoverflow.com

I'm new to hadoop and trying to get my first non-trivial program working, and want to view standard out for debugging purposes. It's my understanding that standard out is directed into ...

92. Why does creating a Path in hadoop cause a NullPointerException?    stackoverflow.com

I'm new to hadoop and trying to create a file in HDFS from within the mapper of a map-reduce job. The following code produces a NullPointerException in the last line:

DistributedFileSystem dfs = ...

93. Why does checking whether a file exists in hadoop cause a NullPointerException?    stackoverflow.com

I'm trying to create or open a file to store some output in HDFS, but I'm getting a NullPointerException when I call the exists method in the second to last line ...

94. In hadoop, how do I initialize the a DistributedFileSystem object via the initialize method?    stackoverflow.com

There are two arguments, a URI and a Configuration. I assume that the JobConf object that the client is set to should work for Configuration, but what about the URI? Here is ...

95. How do I append to a file in hadoop?    stackoverflow.com

I want to create a file in HDFS that has a bunch of lines, each generated by a different call to map. I don't care about the order of the lines, ...

96. Are these Hadoop setup/cleanup/run times reasonable?    stackoverflow.com

I've set up and am testing out a pseudo-distributed Hadoop cluster (with namenode, job tracker, and task tracker/data node all on the same machine). The box I'm running on has about ...

97. Hadoop Input files Order    stackoverflow.com

I have data files arranged in folders named as dates. Directory structure

  • /data/2011/01/01
  • /data/2011/01/02
and so on and inside each directory there are around 50 files I need to parsed and I am ...

98. Hadoop 0.21.0 java.lang.NoSuchMethodError: ProgramDriver    stackoverflow.com

I have a simple Hadoop Job that I sucessfully compiled and ran on Hadoop 0.20.2. Now I am compiling against Hadoop 0.21.0 which works fine but trying to run it yields ...

99. asking about apache zookeeper    stackoverflow.com

Hallo i am mohamad a student in masters degree I want to ask a question about Zookeeper. I read that the write operation in zookeeper to be done first the server connected ...

100. How to contribute to apache?    stackoverflow.com

I am an intermediate Java learner .I want to contribute to Apache Development,I saw there is a list of Apache Projects(like Hadoop,Derby etc),I have developed certain queries which I would like ...