hadoop 2 « hadoop « Java Database Q&A





1. Hadoop job tracker only accessible from localhost    stackoverflow.com

I'm setting up Hadoop (0.20.2). For starters, I just want it to run on a single machine - I'll probably need a cluster at some point, but I'll worry about that ...

2. Executing helloworld.java in apache hadoop    stackoverflow.com

can someone pls tell me how can i execute my HelloWorld.java in apache hadoop which contains

class Helloworld  
{  
  public static void main(String[] args)  
 ...

3. Hadoop Machine learning/Data mining project idea?    stackoverflow.com

Hey guys, I am a graduate CS student (Data mining and machine learning) and have a good exposure to core JAVA (>4 years). I have read up a bunch of stuff ...

4. org.apache.hadoop.mapred.FileAlreadyExistsException    stackoverflow.com

i was trying to run the example program in hadoop given here when i try the run it I get a org.apache.hadoop.mapred.FileAlreadyExistsException

emil@psycho-O:~/project/hadoop-0.20.2$ bin/hadoop jar jar_files/wordcount.jar org.myorg.WordCount jar_files/wordcount/input jar_files/wordcount/output
11/02/06 14:54:23 INFO ...

5. Hue install on vanilla hadoop    stackoverflow.com

Has anyone tried to install HUE on Apache Hadoop? We are using hadoop 0.20.2 and I want to know if anyone has had success with it before I invest time doing ...

6. Hadoop safemode recovery - taking too long!    stackoverflow.com

I have a Hadoop cluster with 18 data nodes. I restarted the name node over two hours ago and the name node is still in safe mode. I have been searching for why ...

7. Hadoop Sequence File usage question    stackoverflow.com

I have a use case to upload some tera-bytes of text files as sequences files on HDFS. These text files have several layouts ranging from 32 to 62 columns (metadata). What would be ...

8. somebody has used WebPIE reasoner?    stackoverflow.com

Somebody has got WebPIE working? I tried to make it run but it doesn reason anything in the output. If I pass the same input to other reasoners (eg: Pellet) it ...

9. Hadoop Single Node : Permission Denied    stackoverflow.com

I just installed Hadoop single node but when i run it by logging on localhost it gives error that it cannot make changes to files as permission is denied?





10. Problem building Hadoop from source: JUnit cannot be resolved?    stackoverflow.com

I'm trying to build Hadoop from source (so I can edit some of the task scheduling), but I'm getting an error with "import org.junit cannot be resolved". However, junit is in ...

11. Sequence Files in Hadoop    stackoverflow.com

How are these sequence files generated ? I saw a link about sequence file here,

http://wiki.apache.org/hadoop/SequenceFile
Are these written using default Java serializer ? and How do I read a sequence file ...

12. I need to read the same XML file twice: in main() and from DistributedCache using the same method    stackoverflow.com

XML file name, located on HDFS is passed to my Hadoop apps via command line. I put this file into DistributedCache and I am able to read it from DistributedCache in ...

13. Can I call Hadoop APIs from my Axis2 Web Service?    stackoverflow.com

I intend to develop a web service which can talk to Hadoop master node perform some tasks. These tasks include: 1. Starting and Stoping hadoop Cluster 2. Adding and Deleting slave nodes from hadoop ...

14. Load Testing in Hadoop    stackoverflow.com

Can anyone tell me how to use SLG(Synthetic Load Generator) in hadoop? Thanks in advance,
Neo

15. Unable to understand a part of TextInputFormat    stackoverflow.com

I was looking into TextInputFormat class in hadoop and started digging into LineRecordReader. I noticed that it has a block of code :

  if (start != 0) {
   ...

16. Having issue setting up Hadoop    stackoverflow.com

The issue I'm having is that when I run bin/hadoop fs -ls it prints out all the files of the local directory that I'm in and not the files in hdfs ...





17. Create Value class for Sequence Files at runtime    stackoverflow.com

I have some types of data that I have to upload on HDFS as Sequence Files. Initially, I had thought of creating a .jr file at runtime depending on the type ...

18. Wrapper for Hadoop Applications    stackoverflow.com

Does there exist a wrapper for Hadoop applications? I'm referring to a wrapper that would turn a Hadoop setup into a standalone application. I understand that this would defeat the purpose ...

19. How to manage dependencies in Java    stackoverflow.com

I'm totally new to Java and I'm trying to build a Hadoop MapReduce program. I added /usr/lib/hadoop/hadoop-core.jar and everything builds fine. When I export the project as a jar file and ...

20. Hadoop jar needs to read model file, and I got java.io.FileNotFoundException: data/ner/pattern.trd (No such file or directory)    stackoverflow.com

==============================Updated on 27, March, 2011============================= I have solved this problem, thanks for your attention. The command I use is as below:

hadoop jar ner-hadoop.jar -libjars lib/ner-lib.jar -archives data.tgz#data input output
Pay attention to this ...

21. Where do I install a jdbc driver on ubuntu?    stackoverflow.com

I'm trying to install the MS SQL JDBC driver on ubuntu to be used with sqoop for Hadoop. I'm totally new to java and linux, so I'm not sure where to ...

22. MultipleOutputFormat support for with 'Job'    stackoverflow.com

I am trying to use MultipleOutputFormat with hadoop 0.20.1 and it seems they only work with deprecated 'JobConf' which in turn uses deprecated Mapper and Reducer (org.apache.hadoop.mapred.Reducer) etc., . Any ideas ...

23. Hadoop and video data    stackoverflow.com

Hadoop is perfect for storing large data that is not accessed in real time and can grow on commodity hardware. Is there an alternative or some system built on top of ...

24. hadoop based question    stackoverflow.com

Suppose I write a java program and i want to run it in Hadoop, then

  1. where should the file be saved?
  2. how to access it from hadoop?
  3. should i be calling it by the ...

25. hadoop based question    stackoverflow.com

Hey,I have a very Basic doubt. Suppose I write a java program and i want to run it in HADOOP,then 1.where should the file be saved? 2.how to access it from ...

26. When was the first version of Hadoop released?    stackoverflow.com

When was the first version of Hadoop released to the public? Any supporting links?
Edit
I should have been more clear - I'm asking this question because the Wikipedia article, the best ...

28. Haddp0.20.0 -- not able to write proper output to file    stackoverflow.com

Consider that I have an input file which contains email addresses. Each email address is present on a new line. For simplicity , lets assume that file contains email addresses belong to ONLY ...

29. Need to do a POC on Hadoop.. need help    stackoverflow.com

Can someone give me an idea on how to startup with the POC on Hadoop ? I am very new to this concept.. any help would be greatly appreciated.

30. Any suggestions for reading two different dataset into Hadoop at the same time?    stackoverflow.com

Dear hadooper: I'm new for hadoop, and recently try to implement an algorithm. This algorithm needs to calculate a matrix, which represent the different rating of every two pair of songs. I already ...

31. hadoop CustomWritables    stackoverflow.com

I have more of a design question regarding the necessity of a CustomWritable for my use case: So I have a document pair that I will process through a pipeline and write ...

32. multiple outputs hadoop    stackoverflow.com

How can I change the code in the WordCount.java program in the examples such that the output of the WordCounts for each file is put on separate files. That is, instead ...

33. running another job in hadoop    stackoverflow.com

I don't understand how to make a job use the same output directory directory to write a different file in it. I have tried commeting and ucommenting this line, but it still doesn't ...

34. How to retrieve hadoop job configuration based on job id that is currently running?    stackoverflow.com

Is there any way to retrieve job configuration (some property from the configuration) if I know job id? Basically, what Im doing is checking if there are any running jobs at the ...

35. Hadoop DistCp using wildcards?    stackoverflow.com

Is it possible to use DistCp to copy only files that match a certain pattern? For example. For /foo I only want *.log files.

36. Just enough Java for Hadoop    stackoverflow.com

I have been a C++ developer for about 10 years. I need to pick up Java just for Hadoop. I doubt I will be doing any thing else in Java. So, ...

37. is there some distrbuted stroage like Hadoop butwith the advantages of ZFS?    stackoverflow.com

Is there some distributed storage like Hadoop but with the advantages of ZFS?

38. Use of thrift/avro for a hadoop job to communicate between Java and C++    stackoverflow.com

Right now we have a hadoop job in Java that is working with some C++ binaries. We write files to NFS and C++ and Java read them and thats our form ...

39. Hadoop workload    stackoverflow.com

I am currently using wordcount application in hadoop as a benchmark. I find that the cpu usage is fairly nearly constant around 80-90%. I would like to have a fluctuating cpu ...

40. How do I prevent `hadoop fs rmr ` from creating $folder$ files?    stackoverflow.com

We're using Amazon's Elastic Map Reduce to perform some large file processing jobs. As a part of our workflow, we occasionally need to remove files from S3 that may already exist. ...

41. Why doesn't Hadoop file system support random I/O?    stackoverflow.com

The distributed file systems which like Google File System and Hadoop doesn't support random I/O.
(It can't modify the file which were written before. Only writing and appending is possible.) Why ...

42. contribution to apache hadoop    stackoverflow.com

i am interested in the development of hadoop. I know java but what are the other prerequisites for contribution to hadoop?? please tell where do i start from??

43. contribution to apache hadoop    stackoverflow.com

I wanted to know that to solve a bug in hadoop first i need to find that bug then i can solve it.Isn't there a list of bugs to be solved ...

44. Hadoop pipes problem    stackoverflow.com

I have configured hadoop in pseudo-distributed mode (single -node cluster) on my ubuntu 10.04. I have a problem in running hadoop pipes code my code is following:

#include "/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include/hadoop/Pipes.hh"
#include "/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include/hadoop/TemplateFactory.hh"
#include "/home/hadoop/project/hadoop-0.20.2/c++/Linux-amd64-64/include/hadoop/StringUtils.hh"


#include ...

45. hadoop single node setup    stackoverflow.com

I am try to do a single node setup for hadoop as given on following link http://hadoop.apache.org/common/docs/current/single_node_setup.html i have followed all the steps till defining JAVA_HOME but the command ...

46. How to get datanode timeout?    stackoverflow.com

I have a 3 node hadoop setup, with replication factor as 2. When one of my datanode dies, namenode waits for 10 mins before removing it from live nodes. Till then my ...

47. How to call Partitioner in Haoop v 0.21    stackoverflow.com

In my application I want to create as many as reducer jobs based on the keys. Now my current implementation writes all the keys and values in a single (reducer) output ...

48. Linker error with Hadoop Pipes    stackoverflow.com

Hadoop n00b here, just started playing around with Hadoop Pipes. I'm getting linker errors while compiling a simple WordCount example using hadoop-0.20.203 (current most recent version) that did not appear for ...

49. How can hadoop job kill it self    stackoverflow.com

Is there any way to kill a hadooop job itself or send a signal to kill it. I've read the Configuration settings from jobConf where it sais that if a user specifies ...

50. Specify multiple input files for a hadoop job    stackoverflow.com

Is there a way to specify multiple input files for a hadoop job? I've tried separation using ',' but that didnt' work...any other suggestions? I was able to do so...by writing my own ...

51. How to call input file which is qlready in the package    stackoverflow.com

In my Hadoop Map Reduce application I have one input file.I want that when I execute the jar of my application, then the input file will automatically be called.To do this ...

52. Iterate twice on values    stackoverflow.com

I receive an iterator as argument and I would like to iterate on values twice.

public void reduce(Pair<String,String> key, Iterator<IntWritable> values,
            ...

53. Hadoop testing using MRUnit    stackoverflow.com

I'm retrofitting a bunch of existing Hadoop unit tests that were previously run in an in-memory cluster (Using MiniMRCluster) into MRUnit. The existing test cases essentially provide input to the ...

54. Save reducers output directory path to a variable in Hadoop    stackoverflow.com

How do I save the output path of Hadoop reducers to a variable? This variable will be used by all other MR jobs. These jobs will be sequential. All the sequential MR jobs will ...

55. how to setup a hadoop node to be a tasktracker but not a datanode    stackoverflow.com

For a special reason, I want to setup a hadoop node to be a tasktracker but not a datanode. It seems like there is a way to do it but I ...

56. question related to hadoop pipes    stackoverflow.com

I'm new to hadoop . All my question are related to hadoop pipes which use c++ not java my question related to hadooppipes are following : Can any body explain the usage of ...

57. Migrating A Java Application to Hadoop : Architecture/Design Roadblocks?    stackoverflow.com

Alrite.. so.. here's a situation: I am responsible for architect-ing the migration of an ETL software (EAI, rather) that is java-based. I'll have to migrate this to Hadoop (the apache version). ...

58. Can I use hadoop to train a neutral network?    stackoverflow.com

I want to train a neural network with the help of Hadoop. We know when training a neural network, weights to each neuron are altered every iteration, and each iteration depends ...

59. Search/Find a file and file content in Hadoop    stackoverflow.com

I am currently working on a project using Hadoop DFS.

  1. I notice there is no search or find command in Hadoop Shell. Is there a way to search and find a ...

60. Hadoop custom split of TextFile    stackoverflow.com

I have a fairly large text file that I would like to convert into a SequenceFile. Unfortunately, the file consists of Python code with logical lines running over several physical lines. ...

61. MultipleOutputs in Apache Hadoop 0.20.203    stackoverflow.com

How are users of Apache Hadoop 0.20.203 dealing with the lack of support for MultipleOutputs (reducers writing to multiple output files)? Older versions of Apache Hadoop support MultipleOutputs, but to use them ...

62. How to track which data block is in which data node in hadoop?    stackoverflow.com

If a data block is replicated, in which data node will it be replicated to? Is there any tool to show where the replicated blocks are present?

63. What are some good resources for studying Hadoop's source code?    stackoverflow.com

I'm currently a student planning on some in depth study on Hadoop. Are good resources (especially courses from universities, or papers published by researchers) that I can use to help me ...

64. Assessing and comparing Hadoop for Business Intelligence Design considerations     stackoverflow.com

I am considering various technologies for data warehousing and business intelligence, and have come upon this radical tool called Hadoop. Hadoop doesn't seem to be exactly built for BI purposes, but ...

65. fs.trash.root not working. Have I configured it wrongly?    stackoverflow.com

I am trying to evaluate the hadoop trash option. I used this property in core-site.xml

<property>
 <name>fs.trash.root</name>
 <value>hdfs://Machinename:8020/Trash</value>
</property>
before using this I tried with
<property>
 <name>fs.trash.root</name>
 <value>/Trash</value>
</property>
But under both these cases the Trash ...

66. Implementing parallel-for in hadoop    stackoverflow.com

I would like to implement a parallel-for in on hadoop. Basically parallel-for receives a sub-skeleton (it could be a function like map() ) and an integer as parameters. the sub-skeleton ...

67. Hadoop copyFromLocal problem wit copying directory    stackoverflow.com

I'd like to copy whole local directory with some subdirectories and files to HDFS. HDFS already contains the root directory and some subdirectories with files. I just want to add newer ...

68. rsync files to hadoop    stackoverflow.com

I have 6 servers and each contains a lot of logs. I'd like to put these logs to hadoop fs via rsync. Now I'm using fuse and rsync writes directly to ...

69. Computing set intersection and set difference of the records of two files with hadoop    stackoverflow.com

Sorry for cross-posting this on the hadoop user mailing list and here, but this is getting an urgent matter for me. My problem is as follows: I have two input files, and I ...

70. Hadoop - executing multi-Map-jobs    stackoverflow.com

I have an application that only implement Map function. I'm creating 1000 jobs, each with a unique PrefixFilter. Example:

public void startNewScan(String prefix, long endTime)
    Job job = new Job(conf, "MyJob");
 ...

71. How can I get the Hadoop command-line scripts working on Win32?    stackoverflow.com

I'm trying to build a Hadoop development environment on my Windows XP 32bit environment. When I try to run one of the utilities I get an error message (see screenshot below). ...

72. Harmonize protobuf-net bcl.Guid's HI/LO with sql uniqueidentifiers for correlated subquerying?    stackoverflow.com

Is there any standard / boilerplate way to convert SQL uniqueidentifiers into the same HI/LO blocks as protobuf-net's BCL.Guids? I have a ton of data in SQL Server that I need to ...

73. Can Hadoop Help Solve Our Analytical Problems    stackoverflow.com

I've never used Hadoop and am still in the process of filling in the blanks of how it can be implemented to help our business. We probably have only a small ...

74. Best way to pre-process text messages using Hadoop    stackoverflow.com

I am using Hadoop to process text messages(SMS). but I am not sure of the best way to pre-process these data so that I can do an efficient search. for example, ...

75. Hadoop 0.20.203 don't load configuration files    stackoverflow.com

Everything was fine but suddenly Hadoop started to ignore configuration files such as core-site.xml and sets all content of Configuration objects by default values. HADOOP_HOME and HADOOP_CONF_DIR variables are set properly. ...

76. How to install hadoop on windows    stackoverflow.com

i am trying to install hadoop on my windows machine with the help of following link i.e. Hadoop Tutorial YDN http://developer.yahoo.com/hadoop/tutorial/module3.html but i am not able to find

  • hadoop.job.ugi by ...

77. Anyone ever used GreenPlum?    stackoverflow.com

Not sure this is the right place to ask about it, but is there anyone with any knowledge about Greenplum For big data?


I have been researching about DBMS ...

78. TaskMemoryManager is disabled    stackoverflow.com

i am trying to execute tasktracker on Cygwin but following error occur's as:- mapred.TaskTracker: Process Tree implementation is missing on this system. TaskMemoryManager is disabled. Rest all (i.e. Namenode,Secondarynamenode,Jobtracker and Datanode) working properly ...

79. Temporary failure in name resolution while run Hadoop/bin/start-all.sh    stackoverflow.com

I caught "Temporary failure in name resolution" while run Hadoop/bin/start-all.sh on my SUSE Linux.I have searched many website to look for the problem,but can not find the effective answer. I look ...

80. Hadoop custom Writeable vs. second pass    stackoverflow.com

I'm working a parsing a large dataset which uses a record which has a primary and secondary keys:

  • Primary Key
  • Secondary Key
  • Additional fields
Primary-Secondary mapping is one-to-many (primary being the 'one'). I'd like to ...

81. Is hadoop's job ThreadSafe?    stackoverflow.com

Anyone knows if org.apache.hadoop.mapreduce.Job is thread-safe? In my application I create a thread for each job, and then waitForCompletion. And I have another monitor thread that checks every job's state with ...

82. Mkdirs failed to create hadoop.tmp.dir    stackoverflow.com

I've upgraded from Apache Hadoop 0.20.2 to the newest stable release; 0.20.203. While doing that, I've also updated all configuration files properly. However, I am getting the following error while trying ...

83. Hadoop read from standard input stream    stackoverflow.com

I want my map-reduce program to read from standard input stream (System.in) For example in the run() method, how can I make my program read from System.in instead of a file ...

84. Time taken for Hadoop job to execute    stackoverflow.com

Is there an API to figure out how long a Hadoop job took to execute (exactly -> no hacks.)?

85. In Hadoop , how to get the instance of currently running Jobtracker?    stackoverflow.com

I am working on a Monitoring Tool for Hadoop. I need to get the currently running jobtracker. How can I get that?

86. How to run the hadoop simple program through command line    stackoverflow.com

I'm new to the hadoop technologies .How to run the simple program through command line.I'm using windows environment.I install the Cygwin.Can you help me ...

87. Select DB, OLAP solutions for fast web analytics (large data array)    stackoverflow.com

I have the following problem: my system collects daily ~300M hits from different sites. Every has time, user id, type (ad or usual), http address, site id. There is also an ...

88. Hadoop order of operations    stackoverflow.com

According to the attached image found on yahoo's hadoop tutorial, the order of operations is map > combine > partition which should be followed by reduce Here is my an example key ...

89. hadoop benchmark - terasort    stackoverflow.com

I built my own 4 nodes (namenode + 3xDatanodes) cluster for Hadoop.
now - I am tring to test its performance: took me 71 seconds:
hadoop jar $HADOOP_INSTALL/hadoop-examples.jar randomwriter random-data -test.randomwrite.bytes_per_map=5000000 -Dtest.randomwrite.total_bytes=50000000 took me ...

90. how to use hadoop for a web application?    stackoverflow.com

I am working on a social networking web based application, which is uses Apache web server and MYSQL server for database with codeigniter MVC frameworks. I don't know how to integrate ...

91. integrate pentaho community with hadoop    stackoverflow.com

i want to integrate hadoop to pentaho data-integration,I found on pentaho site, in that site there is pentaho for hadoop, but it's commercial.i want to make my data-integration community edtion to ...

92. hadoop datanode rack awareness setting    stackoverflow.com

I am building a hadoop cluster. I have 3 racks, each rack consists several virtual machines. How to config hadoop to let it have rack awareness? Some suggest using "topology.script.file.name" to ...

93. stackoverflow with FsUrlStreamHandlerFactory    stackoverflow.com

i need to use java URL to access files on hdfs : ive seen that i must use this : static { URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory()); } but, when i launch my ...

94. Is there something like Hadoop, but based on GPU?    stackoverflow.com

Is there something like Hadoop, but based on GPU? I would like to do some research on distributed computing. Thank you for your help! Yik,

95. error running hadoop project    stackoverflow.com

I'm running hadoop project it showing as follows

11/08/16 12:36:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/08/16 12:36:10 INFO mapred.FileInputFormat: Total input paths ...

96. Extracting content from the Hadoop Text object    stackoverflow.com

I am working with a large text inside a Text object from the Hadoop ( 0.20.203.0 ) Java library. I need to extract XML content from it without converting the whole ...

97. Contribution to Hadoop scheduler    stackoverflow.com

I am working on design of a new hadoop scheduler. I want to go through the hadoop scheduler code (FIFO, FAIR and CAPACITY). The src code of hadoop is huge . Can ...

98. Hadoop Job Scheduling query    stackoverflow.com

I am a beginner to Hadoop.

    As per my understanding, Hadoop framework runs the Jobs in FIFO order. ( default scheduling).

    Is there any way ...

99. Hadoop on OSX "Unable to load realm info from SCDynamicStore"    stackoverflow.com

I am getting this error on startup of Hadoop on OSX 10.7:

Unable to load realm info from SCDynamicStore put: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /user/travis/input/conf. Name node is in ...

100. Hadoop upgrade issue with web monitoring    stackoverflow.com

I just upgraded from 0.19 to 0.20 everything seems fine however the web monitoring tool doesn't work any more: http://mydomain.com:50070/webapps/hdfs/dfshealth.jsp Gives me a 404. Same stands for the job tracking tool Any idea where to ...