I am running a job which is failing with ClassCastException at Mapper. I have tried setting the Mappers and JobConf correctly but I continue get the error. Here is my code:
[1] ... |
I've been following the awesome Yahoo! Hadoop tutorial, which worked great for getting a virtual machine environment set up (Module 3 of the tutorial). But now I'm getting ... |
All the Apache Hadoop Code is hosted in SVN. How does Git help in Hadoop development process? It's not clear from the below article.
http://wiki.apache.org/hadoop/GitAndHadoop
|
I have a Hadoop cluster running. I use Hadoop API to create files in Hadoop.
For example using: create(Path f, boolean overwrite, int bufferSize, short replication, long blockSize, Progressable progress).
I ... |
I have started working with a large dataset that is arriving in JSON format. Unfortunately, the service providing the data feed delivers a non-trivial number of duplicate records. On ... |
How to read/parse a Sequential File written by previous Map Reduce Job. The keyOut and ValueOut of prev MR Job were Text and ByteWritable. What should be the keyin and valuein ... |
I have a problem with changing public static variables in Hadoop.
I am trying to pass some values as arguments to the jar file from command line.
here is my code:
public class ...
|
|
In the "Hadoop : The Definitive Guide" book, there is a sample program with the below code.
JobConf conf = new JobConf(MaxTemperature.class);
conf.setJobName("Max temperature");
FileInputFormat.addInputPath(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new ...
|
Hadoop job is asking to disable safe mode manually. It says the resources are not available. How to disable safe mode?
|
- How to control file assignation in different slave in hadoop distributed system?
- Is it possible to write 2 or more file in hadoop as map reduce task Simultaneously?
I am new to hadoop.It ... |
API only provides methods to increase a counter in Mapper or Reducer. Is there a way to just set it? or increment it's value only once irrespective of the number of ... |
Currently I am trying to pass some values through command line arguments and then parse it using GenericOptionsParser with tool implemented.
from the Master node I run something like this:
bin/hadoop jar MYJAR.jar ...
|
I am trying to read a SequenceFile with custom Writeable in it.
Here's the code:
public static void main(String[] args) throws IOException {
//String iFile = null;
...
|
I'm trying to use native method to call cuda code on hadoop. it loads the .so file effectively. But then in main function when I call cuda code following error occurs.
Exception ...
|
I have a compressed Hadoop SequenceFile from a customer which I'd like to inspect. I do not have full schema information at this time (which I'm working on separately).
But in the ... |
How can I convert DataInput to DataInputStream in java?
I need to know the size of the DataInput.
|
I am new to Hadoop programming.
I have a situation in which I want to stop writing <k3,v3> to my output file after n-lines.
In my program, I am sure that the output ... |
So I'm trying to install hadoop on MAC OS X Leopard following the steps in this note: Running Hadoop on a OS X Single Node Cluster.
I reached Step 4: ... |
I am a newbie to Cassandra and Hadoop. While looking for integration of the two products i came across Brisk. From the description i understand that Brisk replaces HDFS for CassandraFS. ... |
I need fastest access to a single file, several copies of which are stored in many systems using Hadoop. I also need to finding the ping time for each file in ... |
I tried to build hadoop-mapreduce-project using ant.I tried with maven it suceeded but i need to build it with ant. OR is their any alternative of "ant compile-mapred-test" in maven build? ... |
Hi guys : Im trying to setup writes to a remote, single node hadoop instance (remote in that its running on my box in a VM)....
However Im getting ... |
How can I configure solr with Hadoop? Do I only need to put the data folder inside Hadoop?
|
I am using Mac OSX and want to uninstall/re-install(clean) hadoop
Please let me know how can I do that
Thank you
|
Im trying to learn diving a file stored in hdfs into splits and reading it to different process (on different machines.)
What I expect is if I have a SequenceFile containing ... |
Does anyone know or have used copyMerge function in Hadoop API - FileUtil?
copyMerge(FileSystem srcFS, Path srcDir, FileSystem dstFS, Path dstFile, boolean deleteSource, Configuration conf, String addString);
In the function, what is the ... |
A project of mine is to compare different variants of Hadoop, it is said that there are many of them out there, but googling didn't work well for me :(
Does anyone ... |
I have a 32 core system. When I run a MapReduce job using Hadoop I never see the java process use more than 150% CPU (according to top) and it usually ... |
I want to overwrite/reuse the existing output directory when i will run my Hadoop Job daily.
Actually the output directory will store summarized output of each days Job run result's.
If I specify ... |
I have a mapreduce program and is working fine, following are the signatures of map and reduce functions. The outputcollector presently is
output.collect(newtext, new IntWritable(someintegervalue like 5)); //works ok
I need to ... |
Say you have a list of files in HDFS with a common prefix and an incrementing suffix. For example,
part-1.gz, part-2.gz, part-3.gz, ..., part-50.gz
I only want to leave a few file in ... |
I'm trying to compile a gcj version of hadoop's randomwriter It successfully compiles, but when I try to run the resulting executable I get the following output:
anj3@anj3server:~/Downloads/hadoop/hadoop-0.21.0$ gcj -fjni --main=org.apache.hadoop.examples.RandomWriter -findirect-dispatch ...
|
Has anybody successfully run the SortByTemperatureUsingHashPartitioner from "Hadoop The Definitive Guide." book ? Mine crashed. Does anyone know why?
hadoop jar myjob.jar SortByTemperatureUsingTotalOrderPartitioner -D mapred.reduce.tasks=30 input/ncdc/all-seq output-totalsort
11/10/15 14:32:40 INFO security.Groups: Group mapping ...
|
Not to much experience with writing shell scripts but I have to write a script to run a java program on a cloud using hadoop. I have 2 scripts called ... |
I configured kerberos to work with hadoop, since I use cloudera CDH3, so I configured according to the guideline of cloudera.
(Kerberos version is 1.8.4)
All nodes can startup normally, but ... |
I process several server logfiles (around 40) and collect a bunch of metrics using Apache Hadoop. If one or more of those files are inconsistent or corrupted, I would like to ... |
I am Riyas and new in hadoop. if a master node goes down what happened to the cluster? Any slave node can act as a master? Is it need any additional ... |
Im not much of a networking type. Im trying to understand how to debug a hadoop connection - and the connection relies on an RPC port. Any insights into ... |
I am getting this error while running a hadoop pipes program. The program compiles successfully but fails on hadoop pipes.
error while loading shared libraries: Lib.so.0: cannot open shared object file: No ...
|
I am trying to setup a Hadoop cluster but i am unable to access the slave machine using ssh, though i am able to ssh to the localhost.i have tried the ... |
I've been hearing a lot about Apache Hadoop as an awesome way to do processing intensive taks. Looking for a really basic introduction to Hadoop. Like the helloworld equivalent, and then ... |
I am trying to run a high-memory job on a Hadoop cluster (0.20.203). I modified the mapred-site.xml to enforce some memory limits.
<property>
<name>mapred.cluster.max.map.memory.mb</name>
...
|
When running multiple threads in hadoop in parallel, some jobs fail randomly. Also there are exceptions like ChecksumException and SaxParserException(Premature end of file). Tried many ways to fix these but couldn't ... |
Is there a way to set the replication factor for the output of a specific MapReduce job to be different than the rest of the cluster (say 1)? I'd like my ... |
I am trying to install hadoop on my ubuntu box, but enounter the below error while check out :
svn[options] could not connect to server http://svn.apache.org
Any idea why ... |
We are using Hadoop through the Hadoop C/C++ API (libhdfs.so). We use the latest stable Hadoop version which is 0.20.203. Unfortunately, there are no clear (and up to date) instructions to ... |
I'm starting development of a Hadoop application and I'd like to manage it via a couple of MBeans. I've experimented with using MBeanUtils.register and MBeanServer's register method in jar files ... |
I am trying to make a pseudo-distributed Hadoop installation on my Gentoo machine. I want nothing to be visible from the outside network - e.g. jobtracker and namenode web interfaces - ... |
I have a file in which a set of every four lines represents a record.
eg, first four lines represent record1, next four represent record 2 and so on..
How can I ensure ... |
I am working with Hadoop 0.20, and wish to use the NLinesInputFormat, but this functionality isn't present?
Is there an alternative?
Here's what I'm trying to do:
Records in the data span multiple lines, ... |
Is there a way to restrict the number of concurrent reduce slots per user in hadoop? We want to ensure no single user is using up all available reduce slots at ... |
I am running a hadoop job, I have FileSystem object and Path object and I want to know what is the file (Path) size.
any idea?
|
Recently I was trying to understand the working of Mumak (see, e.g., MAPREDUCE-728)
It basically takes a job trace and topology trace and simulates hadoop.
I couldn't understand how it assigns ... |
I am wondering what tools do people use for generating documentation for Big Data analytics. By that I mean aggregating, ranking, clustering, etc. multi-terabyte data sets using things such as Hadoop, ... |
I'm in the process of building a complete 'scale-out'able solution to provide in-depth realtime analytics to our customers.
The customers mainly have up to 200 servers, each having at most 400 sessions ... |
I'm trying to run hadoop jar /usr/lib/hadoop/hadoop-examples.jar aggregatewordcount /data/gutenberg/huckfinn.txt output/guten4 but get an error "huckfinn.txt not a SequenceFile".
I read on other sites, and see in the source ... |
There is a toArray() method in ArrayWritable class in hadoop which should mean: convert this ArrayWritable to an array. But the syntax of of it is:
public Object toArray()
So how should we ... |
I need to train a neural network with 2-4 hidden layers, not sure yet on the structure of the actual net. I was thinking to train it using Hadoop map reduce ... |
Information
My question is regarding BigData in .Net. BigData is used to store and query huge ammounts of data (Facebook, Google, Twitter, ...). Examples of BigData are MapReduce, Hadoop, Dryad, ...
Microsoft dropped ... |
What is the point in feeding an Hadoop cluster and using that cluster to feed data into a Vertica/InfoBright datawarehouse ?
All thse vendor keep saying "we can connect with Hadoop", but ... |
While trying to run a C++ program referring this ( link ) on my hadoop cluster. I got the error mentioned below.
I referred related posts (this) regarding this ... |
I am trying to use Global Variables in Hadoop via the Conf.set() and Context.getConfiguration().get() methods.
However, these don't seem to be working inside a Cleanup method I'm using - Though I am ... |
In my Hadoop environment, I need to configure my slave nodes so that when they communicate in the middle of a map/reduce job they use the internal IP instead of the ... |
I want to run unit test but I need to have a org.apache.hadoop.fs.FileSystem instance.
Are there any mock or any other solution for creating FileSystem?
|
Hi all I am pretty new to the HDFS and was looking for some opinions on some conflicting answers I have recently gotten. 1. Is it a good idea to compress the stream to write the file out to hadoop. One person told me they had got 10x benefit from doing this. Another told me that it was bad to compress ... |
I checked only the possiblity to use Hadoop on the cloud and I found some ec2 scripts which handles instance startups. I'm not sure if it is possible to increase the size of a cluster dinamically. Currently I see some static configuration files which controls the number of nodes in the cluster. Since the pricing model of EC2 instances are hourly ... |
Hi, We have been using Hadoop from past 6 months. It has changed the way we think programming and not to forget the immense performance improvements. Few queries to Chuck, Which Hadoop distribution you would be targeting 0.20.2 ? Do you also cover Unit testing for Map reduce programs ?. - This is one area where not much information and guidelines ... |
|
Hadoop provides many interfaces to its filesystems, and it generally uses the URI scheme to pick the correct filesystem instance to communicate with. Although it is possible (and sometimes very convenient) to run MapReduce programs that access any of these filesystems, when you are processing large volumes of data, you should choose a distributed filesystem that has the data locality optimization, ... |
It's definitely possible to install Hadoop on a Mac. In fact, almost every developer you see in a Hadoop conference is carrying a Mac :P To be more specific, Hadoop is targeted for running on Unix and has several modes of operation. In production ("fully distributed mode"), it runs on a cluster of Unix machines, which are usually cheap Linux boxes. ... |
Hadoop is targeted for developing programs to process large data sets. It's useful whenever you have a lot of data to process or analyze. The first Hadoop application for many web companies is to analyze log data. For example, you can look at log data to see how many unique viewers you have and where do they tend to come from. ... |
|
Hi, I just ever heard about hadoop,I read sample chapter from hadoop in action made me interested, I've some questions: 1. is it extendable framework ? 2. are there any other similar framework ? if yes, how's the comparation of their performance? 3. can it run program created with other language than java ? |
|
Yes. I wrote the book because I heard the same frustrations from many people. Hadoop has a steep learning curve not because it's complicated, but because it's novel. Also, like many open source projects, a lot of the documentation are organized for reference rather than for learning. I intend my book for the general Java programmer with no background in distributed ... |
|
Search engines is about retrieval. Hadoop with their MapReduce algorithm framework is about data processing. Every search engine has a data processing requirement until the data is indexed etc. Really big search engines needs really big data processing frameworks. Hadoop is the one. But the category of data processing doesn not reduce to search index processing, but there are plenty of ... |
I've seen a number of courses in universities where students are expected to get up to speed on Hadoop in about 2-4 weeks. My memory is a bit vague on this one, but I do remember somewhere that a mid-term homework assignment was to implement PageRank over Wikipedia articles using Hadoop. I would certainly consider that a "comfortable" level. Of course, ... |
|
As someone who doesn't use Hadoop, at least not yet, it seems to me that to really get a feel for setting up, managing, and testing an implementation of Hadoop you need to have a multiple machine setup. You can't mimic real world use cases if you're running it on one machine. Arguably it's not even helpful to set it up ... |
|
|