hadoop 3 « hadoop « Java Database Q&A





1. Hadoop job fails with ClassCastException    stackoverflow.com

I am running a job which is failing with ClassCastException at Mapper. I have tried setting the Mappers and JobConf correctly but I continue get the error. Here is my code: [1] ...

2. Is there a good online tutorial for Hadoop development on a Windows 7 machine?    stackoverflow.com

I've been following the awesome Yahoo! Hadoop tutorial, which worked great for getting a virtual machine environment set up (Module 3 of the tutorial). But now I'm getting ...

3. Hadoop Code - Git and SVN    stackoverflow.com

All the Apache Hadoop Code is hosted in SVN. How does Git help in Hadoop development process? It's not clear from the below article. http://wiki.apache.org/hadoop/GitAndHadoop

4. How to dynamic change existing files' block size in Hadoop?    stackoverflow.com

I have a Hadoop cluster running. I use Hadoop API to create files in Hadoop. For example using: create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress). I ...

5. How to exclude duplicate records from a large data feed?    stackoverflow.com

I have started working with a large dataset that is arriving in JSON format. Unfortunately, the service providing the data feed delivers a non-trivial number of duplicate records. On ...

6. Sequential Files in Hadoop    stackoverflow.com

How to read/parse a Sequential File written by previous Map Reduce Job. The keyOut and ValueOut of prev MR Job were Text and ByteWritable. What should be the keyin and valuein ...

7. Initialize public static variable in Hadoop through arguments    stackoverflow.com

I have a problem with changing public static variables in Hadoop. I am trying to pass some values as arguments to the jar file from command line. here is my code:

public class ...

8. Why do we need to set the output key/value class explicitly in the Hadoop program?    stackoverflow.com

In the "Hadoop : The Definitive Guide" book, there is a sample program with the below code.

JobConf conf = new JobConf(MaxTemperature.class);  
conf.setJobName("Max temperature");  
FileInputFormat.addInputPath(conf, new Path(args[0]));  
FileOutputFormat.setOutputPath(conf, new ...

9. Hadoop job asks to disable safe node    stackoverflow.com

Hadoop job is asking to disable safe mode manually. It says the resources are not available. How to disable safe mode?





10. How to control file assignation in different slave in hadoop distributed system?    stackoverflow.com

  1. How to control file assignation in different slave in hadoop distributed system?
  2. Is it possible to write 2 or more file in hadoop as map reduce task Simultaneously?
I am new to hadoop.It ...

11. Is there a way to "set" Hadoop Counter instead of incrementing it?    stackoverflow.com

API only provides methods to increase a counter in Mapper or Reducer. Is there a way to just set it? or increment it's value only once irrespective of the number of ...

12. Change default configuration on Hadoop slave nodes?    stackoverflow.com

Currently I am trying to pass some values through command line arguments and then parse it using GenericOptionsParser with tool implemented. from the Master node I run something like this:

bin/hadoop jar MYJAR.jar ...

13. java.lang.NoClassDefFoundError when reading hadoop SequenceFile    stackoverflow.com

I am trying to read a SequenceFile with custom Writeable in it. Here's the code:

public static void main(String[] args) throws IOException {
    //String iFile = null;
    ...

14. Running JNI code calling cuda code on hadoop    stackoverflow.com

I'm trying to use native method to call cuda code on hadoop. it loads the .so file effectively. But then in main function when I call cuda code following error occurs.

Exception ...

15. How can I inspect a Hadoop SequenceFile for which I lack full schema information?    stackoverflow.com

I have a compressed Hadoop SequenceFile from a customer which I'd like to inspect. I do not have full schema information at this time (which I'm working on separately). But in the ...

16. Convert DataInput to DataInputStream?    stackoverflow.com

How can I convert DataInput to DataInputStream in java? I need to know the size of the DataInput.





17. Controlling number of lines to be written to the output file    stackoverflow.com

I am new to Hadoop programming. I have a situation in which I want to stop writing <k3,v3> to my output file after n-lines. In my program, I am sure that the output ...

18. Hadoop MAC OS installation woes    stackoverflow.com

So I'm trying to install hadoop on MAC OS X Leopard following the steps in this note: Running Hadoop on a OS X Single Node Cluster. I reached Step 4: ...

19. Brisk for small files    stackoverflow.com

I am a newbie to Cassandra and Hadoop. While looking for integration of the two products i came across Brisk. From the description i understand that Brisk replaces HDFS for CassandraFS. ...

20. Fastest access of a file using Hadoop    stackoverflow.com

I need fastest access to a single file, several copies of which are stored in many systems using Hadoop. I also need to finding the ping time for each file in ...

21. Building hadoop using ant    stackoverflow.com

I tried to build hadoop-mapreduce-project using ant.I tried with maven it suceeded but i need to build it with ant. OR is their any alternative of "ant compile-mapred-test" in maven build? ...

22. Does java api for hadoop writing require SSH?    stackoverflow.com

Hi guys : Im trying to setup writes to a remote, single node hadoop instance (remote in that its running on my box in a VM).... However Im getting ...

23. How to config Solr with hadoop?    stackoverflow.com

How can I configure solr with Hadoop? Do I only need to put the data folder inside Hadoop?

24. How to uninstall Hadoop?    stackoverflow.com

I am using Mac OSX and want to uninstall/re-install(clean) hadoop Please let me know how can I do that Thank you

25. hadoop split file in equally size    stackoverflow.com

Im trying to learn diving a file stored in hdfs into splits and reading it to different process (on different machines.) What I expect is if I have a SequenceFile containing ...

26. How to use Hadoop API copyMerge function? What is the addString parameter?    stackoverflow.com

Does anyone know or have used copyMerge function in Hadoop API - FileUtil?

copyMerge(FileSystem srcFS, Path srcDir, FileSystem dstFS, Path dstFile, boolean deleteSource, Configuration conf, String addString);
In the function, what is the ...

27. Variants of Hadoop    stackoverflow.com

A project of mine is to compare different variants of Hadoop, it is said that there are many of them out there, but googling didn't work well for me :( Does anyone ...

28. How to make Hadoop use all the cores on my system?    stackoverflow.com

I have a 32 core system. When I run a MapReduce job using Hadoop I never see the java process use more than 150% CPU (according to top) and it usually ...

29. How to overwrite/reuse the exisitng output path for hadoop Job's again and agin overwrite    stackoverflow.com

I want to overwrite/reuse the existing output directory when i will run my Hadoop Job daily. Actually the output directory will store summarized output of each days Job run result's. If I specify ...

30. Hadoop outputCollector    stackoverflow.com

I have a mapreduce program and is working fine, following are the signatures of map and reduce functions. The outputcollector presently is

output.collect(newtext, new IntWritable(someintegervalue like 5)); //works ok
I need to ...

31. Hadoop & Bash: delete filenames matching range    stackoverflow.com

Say you have a list of files in HDFS with a common prefix and an incrementing suffix. For example,

part-1.gz, part-2.gz, part-3.gz, ..., part-50.gz
I only want to leave a few file in ...

32. Using GCJ to compile Hadoop RandomWriter    stackoverflow.com

I'm trying to compile a gcj version of hadoop's randomwriter It successfully compiles, but when I try to run the resulting executable I get the following output:

anj3@anj3server:~/Downloads/hadoop/hadoop-0.21.0$ gcj -fjni --main=org.apache.hadoop.examples.RandomWriter -findirect-dispatch ...

33. SortByTemperatureUsingHashPartitioner NullPointerException    stackoverflow.com

Has anybody successfully run the SortByTemperatureUsingHashPartitioner from "Hadoop The Definitive Guide." book ? Mine crashed. Does anyone know why?

hadoop jar myjob.jar SortByTemperatureUsingTotalOrderPartitioner -D mapred.reduce.tasks=30 input/ncdc/all-seq output-totalsort
11/10/15 14:32:40 INFO security.Groups: Group mapping ...

34. Struggling with scripting    stackoverflow.com

Not to much experience with writing shell scripts but I have to write a script to run a java program on a cloud using hadoop. I have 2 scripts called ...

35. Kerberos with Hadoop, error: avax.security.sasl.SaslException: GSS initiate failed    stackoverflow.com

I configured kerberos to work with hadoop, since I use cloudera CDH3, so I configured according to the guideline of cloudera. (Kerberos version is 1.8.4) All nodes can startup normally, but ...

36. Apache Hadoop - Excluding files when corrupt    stackoverflow.com

I process several server logfiles (around 40) and collect a bunch of metrics using Apache Hadoop. If one or more of those files are inconsistent or corrupted, I would like to ...

37. hadoop master node slave node datanode    stackoverflow.com

I am Riyas and new in hadoop. if a master node goes down what happened to the cluster? Any slave node can act as a master? Is it need any additional ...

38. What is an RPC port and how is it relevant to connecting to Hadoop?    stackoverflow.com

Im not much of a networking type. Im trying to understand how to debug a hadoop connection - and the connection relies on an RPC port. Any insights into ...

39. Hadoop Pipes cannot find shared libraries    stackoverflow.com

I am getting this error while running a hadoop pipes program. The program compiles successfully but fails on hadoop pipes.

error while loading shared libraries: Lib.so.0: cannot open shared object file: No ...

40. not able to communicate with the client using ssh    stackoverflow.com

I am trying to setup a Hadoop cluster but i am unable to access the slave machine using ssh, though i am able to ssh to the localhost.i have tried the ...

41. Hadoop Hello World Example And Introduction    stackoverflow.com

I've been hearing a lot about Apache Hadoop as an awesome way to do processing intensive taks. Looking for a really basic introduction to Hadoop. Like the helloworld equivalent, and then ...

42. Specifying memory limits with hadoop    stackoverflow.com

I am trying to run a high-memory job on a Hadoop cluster (0.20.203). I modified the mapred-site.xml to enforce some memory limits.

  <property>
    <name>mapred.cluster.max.map.memory.mb</name>
   ...

43. Will hadoop support multiple threads in local mode?    stackoverflow.com

When running multiple threads in hadoop in parallel, some jobs fail randomly. Also there are exceptions like ChecksumException and SaxParserException(Premature end of file). Tried many ways to fix these but couldn't ...

44. Turn off replication only for Hadoop job output    stackoverflow.com

Is there a way to set the replication factor for the output of a specific MapReduce job to be different than the rest of the cluster (say 1)? I'd like my ...

45. Error while svn checkout http://svn.apache.org/repos/asf/hadoop/common/trunk    stackoverflow.com

I am trying to install hadoop on my ubuntu box, but enounter the below error while check out :

svn[options] could not connect to server http://svn.apache.org
Any idea why ...

46. Hadoop: How to compile libhdfs.so?    stackoverflow.com

We are using Hadoop through the Hadoop C/C++ API (libhdfs.so). We use the latest stable Hadoop version which is 0.20.203. Unfortunately, there are no clear (and up to date) instructions to ...

47. Deploying custom MBeans to Hadoop    stackoverflow.com

I'm starting development of a Hadoop application and I'd like to manage it via a couple of MBeans. I've experimented with using MBeanUtils.register and MBeanServer's register method in jar files ...

48. Localhost-only pseudo-distributed hadoop installation    stackoverflow.com

I am trying to make a pseudo-distributed Hadoop installation on my Gentoo machine. I want nothing to be visible from the outside network - e.g. jobtracker and namenode web interfaces - ...

49. Hadoop read multiple lines at a time    stackoverflow.com

I have a file in which a set of every four lines represents a record. eg, first four lines represent record1, next four represent record 2 and so on.. How can I ensure ...

50. NLinesInputFormat Alternative in Hadoop 0.20?    stackoverflow.com

I am working with Hadoop 0.20, and wish to use the NLinesInputFormat, but this functionality isn't present? Is there an alternative? Here's what I'm trying to do: Records in the data span multiple lines, ...

51. Restrict number of concurrent reducers per user    stackoverflow.com

Is there a way to restrict the number of concurrent reduce slots per user in hadoop? We want to ensure no single user is using up all available reduce slots at ...

52. How to get file size    stackoverflow.com

I am running a hadoop job, I have FileSystem object and Path object and I want to know what is the file (Path) size. any idea?

53. Understanding Hadoop Simulator Mumak    stackoverflow.com

Recently I was trying to understand the working of Mumak (see, e.g., MAPREDUCE-728) It basically takes a job trace and topology trace and simulates hadoop. I couldn't understand how it assigns ...

54. Documentation Generator for Big Data Analytics    stackoverflow.com

I am wondering what tools do people use for generating documentation for Big Data analytics. By that I mean aggregating, ranking, clustering, etc. multi-terabyte data sets using things such as Hadoop, ...

55. Hadoop and analytics?    stackoverflow.com

I'm in the process of building a complete 'scale-out'able solution to provide in-depth realtime analytics to our customers. The customers mainly have up to 200 servers, each having at most 400 sessions ...

56. Hadoop word count example fails with 'not a SequentialFile'. How set file format?    stackoverflow.com

I'm trying to run hadoop jar /usr/lib/hadoop/hadoop-examples.jar aggregatewordcount /data/gutenberg/huckfinn.txt output/guten4 but get an error "huckfinn.txt not a SequenceFile". I read on other sites, and see in the source ...

57. How to use toArray() method in ArrayWritable - Hadoop    stackoverflow.com

There is a toArray() method in ArrayWritable class in hadoop which should mean: convert this ArrayWritable to an array. But the syntax of of it is:

public Object toArray()
So how should we ...

58. Neural Network training in parallel, better to use Hadoop or a gpu?    stackoverflow.com

I need to train a neural network with 2-4 hidden layers, not sure yet on the structure of the actual net. I was thinking to train it using Hadoop map reduce ...

59. .Net and Hadoop - What to know / learn and what is available?    stackoverflow.com

Information

My question is regarding BigData in .Net. BigData is used to store and query huge ammounts of data (Facebook, Google, Twitter, ...). Examples of BigData are MapReduce, Hadoop, Dryad, ... Microsoft dropped ...

60. Why do Column oriented databases such as Vertica/InfoBright/GreenPlum make a fuss of Hadoop?    stackoverflow.com

What is the point in feeding an Hadoop cluster and using that cluster to feed data into a Vertica/InfoBright datawarehouse ? All thse vendor keep saying "we can connect with Hadoop", but ...

61. Error running Hadoop pipes Program: "Server failed to authenticate"    stackoverflow.com

While trying to run a C++ program referring this ( link ) on my hadoop cluster. I got the error mentioned below. I referred related posts (this) regarding this ...

62. Hadoop Global Property Conf.Set / Conf.Get in Cleanup()?    stackoverflow.com

I am trying to use Global Variables in Hadoop via the Conf.set() and Context.getConfiguration().get() methods. However, these don't seem to be working inside a Cleanup method I'm using - Though I am ...

63. Configuring a slave's hostname using internal IP - Multiple NICs    stackoverflow.com

In my Hadoop environment, I need to configure my slave nodes so that when they communicate in the middle of a map/reduce job they use the internal IP instead of the ...

64. Hadoop: How to unit test FileSystem    stackoverflow.com

I want to run unit test but I need to have a org.apache.hadoop.fs.FileSystem instance. Are there any mock or any other solution for creating FileSystem?

65. Hadoop and compression    coderanch.com

Hi all I am pretty new to the HDFS and was looking for some opinions on some conflicting answers I have recently gotten. 1. Is it a good idea to compress the stream to write the file out to hadoop. One person told me they had got 10x benefit from doing this. Another told me that it was bad to compress ...

66. Hadoop in the cloud    coderanch.com

I checked only the possiblity to use Hadoop on the cloud and I found some ec2 scripts which handles instance startups. I'm not sure if it is possible to increase the size of a cluster dinamically. Currently I see some static configuration files which controls the number of nodes in the cluster. Since the pricing model of EC2 instances are hourly ...

67. Hadoop Rocks    coderanch.com

Hi, We have been using Hadoop from past 6 months. It has changed the way we think programming and not to forget the immense performance improvements. Few queries to Chuck, Which Hadoop distribution you would be targeting 0.20.2 ? Do you also cover Unit testing for Map reduce programs ?. - This is one area where not much information and guidelines ...

68. Data stores used in Hadoop in Action    coderanch.com

69. Why Hadoop needs its own file system?    coderanch.com

Hadoop provides many interfaces to its filesystems, and it generally uses the URI scheme to pick the correct filesystem instance to communicate with. Although it is possible (and sometimes very convenient) to run MapReduce programs that access any of these filesystems, when you are processing large volumes of data, you should choose a distributed filesystem that has the data locality optimization, ...

70. Hadoop in Mac    coderanch.com

It's definitely possible to install Hadoop on a Mac. In fact, almost every developer you see in a Hadoop conference is carrying a Mac :P To be more specific, Hadoop is targeted for running on Unix and has several modes of operation. In production ("fully distributed mode"), it runs on a cluster of Unix machines, which are usually cheap Linux boxes. ...

71. Hadoop usage examples    coderanch.com

Hadoop is targeted for developing programs to process large data sets. It's useful whenever you have a lot of data to process or analyze. The first Hadoop application for many web companies is to analyze log data. For example, you can look at log data to see how many unique viewers you have and where do they tend to come from. ...

73. new in hadoop    coderanch.com

Hi, I just ever heard about hadoop,I read sample chapter from hadoop in action made me interested, I've some questions: 1. is it extendable framework ? 2. are there any other similar framework ? if yes, how's the comparation of their performance? 3. can it run program created with other language than java ?

74. Hadoop with Drools    coderanch.com

75. Is HADOOP complicated?    coderanch.com

Yes. I wrote the book because I heard the same frustrations from many people. Hadoop has a steep learning curve not because it's complicated, but because it's novel. Also, like many open source projects, a lot of the documentation are organized for reference rather than for learning. I intend my book for the general Java programmer with no background in distributed ...

76. Hadoop Architecture    coderanch.com

77. Hadoop in enterprise    coderanch.com

Search engines is about retrieval. Hadoop with their MapReduce algorithm framework is about data processing. Every search engine has a data processing requirement until the data is indexed etc. Really big search engines needs really big data processing frameworks. Hadoop is the one. But the category of data processing doesn not reduce to search index processing, but there are plenty of ...

78. Hadoop - mean time to productivity    coderanch.com

I've seen a number of courses in universities where students are expected to get up to speed on Hadoop in about 2-4 weeks. My memory is a bit vague on this one, but I do remember somewhere that a mid-term homework assignment was to implement PageRank over Wikipedia articles using Hadoop. I would certainly consider that a "comfortable" level. Of course, ...

79. Usage of Hadoop    coderanch.com

80. Hadoop testing/deployment/learning on your own    coderanch.com

As someone who doesn't use Hadoop, at least not yet, it seems to me that to really get a feel for setting up, managing, and testing an implementation of Hadoop you need to have a multiple machine setup. You can't mimic real world use cases if you're running it on one machine. Arguably it's not even helpful to set it up ...

81. what's Hadoop ?    coderanch.com

82. * Winners: Hadoop in Action    coderanch.com