Have any of you tried Hadoop? Can it be used without the distributed filesystem that goes with it, in a Share-nothing architecture? Would that make sense?
I'm also interested into any performance ... |
I'm examining Hadoop as a possible tool with which to do some log analysis. I want to analyze several kinds of statistics in one run. Each line of my ... |
bash-3.2$ echo $JAVA_HOME
/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home
bash-3.2$ bin/hadoop dfs -copyFromLocal conf /user/yokkom/input2
bash-3.2$ bin/hadoop jar hadoop-*-examples.jar grep input2 output 'dfs[a-z.]+'
09/04/17 10:09:32 INFO mapred.FileInputFormat: Total input paths to process : 10
09/04/17 10:09:33 INFO mapred.JobClient: Running job: job_200904171309_0001
java.io.IOException: ... |
Can someone explain what is hadoop in terms of the ideas behind the software ? What makes it so popular and/or powerful ?
|
I need a system to analyze large log files. A friend directed me to hadoop the other day and it seems perfect for my needs. My question revolves around getting ... |
I'm a .NET programmer doing some Hadoop work in Java and I'm kind of lost here. In Hadoop I am trying to setup a Map-Reduce job where the output key of ... |
I am going to start on with a new project. I need to deal with hundred gigs of data in a .NET application. It is very early stage now to give ... |
|
I'm the administrator for a company intranet and I'd like to start producing videos. However, we have a very small bandwidth tunnel between our locations, and I'd like to avoid hogging ... |
What is the most efficient way to look up values in a BDB for several files in parallel? If I had a Perl script which did this for one file at ... |
I want to know what Hadoop is ? I have gone through Google and Wikipedia but I am not clear of what actually Hadoop is and what is the goal of ... |
have written a stochastic simulation in Java, which loads data from a few CSV files on disk (totaling about 100MB) and writes results to another output file (not much data, just ... |
Is it possible to run Hadoop so that it only uses spare CPU cycles? I.e. would it be feasible to install Hadoop on peoples work machines so that number crunching ... |
What's a good method for assigning work to a set of remote machines? Consider an example where the task is very CPU and RAM intensive, but doesn't actually process a ... |
Is there a way to determine if a file in hadoop is being written to? eg- I have a process that puts logs into hdfs. I have another process ... |
I am researching Hadoop to see which of its products suits our need for quick queries against large data sets (billions of records per set)
The queries will be performed against chip ... |
Say if I want to convert 1000s of word files to pdf then would using Hadoop to approach this problem make sense? Would using Hadoop have any advantage over simply using ... |
Here's my source code
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class PageRank {
public static final String MAGIC_STRING ...
|
Currently my application uses C# with MONO on Linux to communicate to local file systems (e.g. ext2, ext3). The basic operations are open a file, write/read from file and close/delete the ... |
I am developing a java based application; its pertinent requirements are listed below
- Large datasets exist on several machines on network. my program needs to (remotely) execute a java program to process ...
|
I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file ... |
Is it possible to add new nodes to Hadoop after it is started? I know that you can remove nodes (as that the master tends to keep tabs on the node ... |
I am trying to output the results of my reducer to multiple files. The data results are all contained in one file, and the rest of the results are split based ... |
I am interested in the Apache Hadoop project, but i would like to know if any other tested (please mind the 'tested') projects/frameworks are out there.
Appreciate any information/links to projects similar ... |
I want to do log parsing of huge amounts of data and gather analytic information. However all the data comes from external sources and I have only 2 machines to store ... |
I wish to run a second instance of Hadoop on a machine which already has an instance of Hadoop running. After untar'ing hadoop distribution, some config files need to changed from ... |
Hadoop has configuration parameter hadoop.tmp.dir which, as per documentation, is "A base for other temporary directories." I presume, this path refers to local file system.
I set this value to /mnt/hadoop-tmp/hadoop-${user.name}. ... |
I would love to get a sense if haddop is right tool for the problem I have.
I'm building offline process (once a month or one a quarter) that matches 2 ... |
Is there a input class to deal with [multiple] large XML files based on their tree structure in Hadoop? I have a set of XML files that are of the same ... |
I used hadoop to run map-reduce applications on our cluster. The jobs take around 10 hours to complete daily. I want to know the time taken for each job, and the ... |
I need to store large amount of small data objects (millions of rows per month). Once they're saved they wont change. I need to :
- store them securely
- use them to analysis (mostly ...
|
according to Apache AVRO project, "Avro is a serialization system". By saying data serialization system, does it mean that avro is a product or api?
also, I am not quit sure about ... |
I am looking to do some quite processor-intensive brute force processing for string matching. I have run my prototype in a multi-threaded environment and compared the performance to an implementation ... |
I want to develop a website that will allow analysts within the company to run Hadoop jobs (choose from a set of defined jobs) and see their job's status\progress.
Is there an ... |
I want to learn hadoop. However, I don't have access to a cluster now. Is it possible for me to learn it and use it for writing programs and learn it ... |
I have set-up Hadoop on a OpenSuse 11.2 VM using Virtualbox.I have made the prerequisite configs. I ran this example in the Standalone mode successfully.
But in psuedo-distributed mode I get ... |
Do you know any large datasets to experiment with Hadoop which is free/low cost?
Any pointers/links related is appreciated.
Prefernce:
- Atleast one GB of data.
- Production log data of webserver.
Few of them which I found ... |
I am new to hadoop.
I have a file Wordcount.java which refers hadoop.jar and stanford-parser.jar
I am running the following commnad
javac -classpath .:hadoop-0.20.1-core.jar:stanford-parser.jar -d ep WordCount.java
jar cvf ep.jar -C ep .
bin/hadoop ...
|
I am currently trying to perform calculations like clustering coefficient on huge graphs with the help of Hadoop. Therefore I need an efficient way to store the graph in a way ... |
In the latest Hadoop Studio the 0.18 API of Hadoop is called "Stable" and the 0.20 API of Hadoop is called "Unstable".
The distribution that comes from Yahoo is a ... |
I need some good references for using Hadoop for real-time systems like searching with little response time. I know hadoop has its overhead of hdfs, but whats the best way of ... |
I am starting on a new Hadoop project that will have multiple hadoop jobs(and hence multiple jar files). Using mercurial for source control, I was wondering what would be optimal way ... |
I'm trying to create a simple project with hadoop. I am new to IntelliJ and am trying to set the classpath to org.apache.hadoop.io. But what jar has this class?
|
I would like to know what yours Hadoop development environment looks like?
Do you deploy jars to test cluster, or run jars in local mode?
What IDE do you use and what plugins ...
|
I want to merge 2 bzip2'ed files. I tried appending one to another: cat file1.bzip2 file2.bzip2 > out.bzip2 which seems to work (this file decompressed correctly), but I want to use ... |
On some websites (like in this PDF : http://sortbenchmark.org/Yahoo2009.pdf) I see very nice graphs that visualize what an Hadoop cluster is doing at what moment.
Were these made "manually" (i.e. ... |
I've started getting into technology books to read. I want to learn Hadoop, and I find that I enjoy just reading books rather than staring at a computer screen ... |
import java.awt.image.BufferedImage;
import java.awt.image.DataBufferByte;
import java.awt.image.Raster;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import javax.imageio.ImageIO;
import javax.xml.soap.Text;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
public class blur {
public static class BlurMapper extends MapReduceBase implements Mapper<Text, BytesWritable, LongWritable, BytesWritable>
{
OutputCollector<LongWritable, BytesWritable> goutput;
...
|
sorry to disturb again but i like learning here.
i am using JHLabs library on filters for buffered images.on running my code i am getting this exception:L
java.lang.ArrayIndexOutOfBoundsException: 4
at ...
|
first of all thanks for showing interest.
I'm Adarsh Sharma presently working on Hadoop Technologies such as Hive, Hadoop, HadoopDB , Hbase etc.
I have configured HadoopDB on the Hadoop Cluster of 3 ... |
I am trying to take advantage of multiple pools in FairScheduler. But all my jobs are submitted by a single agent process and therefore all belong to same user.
I have set ... |
I'm writing a simple program for enumerating triangles in directed graphs for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab symbol serves ... |
I'm trying to practice some data mining algorithms over hadoop. Can I do it with HDFS alone or do I need to use the sub-projects like hive/hbase/pig?
Thanks,
ram.
|
I have created the following shell script for invoking a hadoop job:
#!/bin/bash
/opt/hadoop/bin/hadoop jar /path/to/job.jar com.do.something <param-1> ... <param-n> &
wait %1
STATUS=$?
if [ $STATUS -eq 0 ]
then
...
|
I'm running a Hadoop job over 1,5 TB of data with doing much pattern matching. I have several machines with 16GB RAM each, and I always get OutOfMemoryException on this job ... |
I've been tasked with processing multiple terabytes worth of SCM data for my company. I set up a hadoop cluster and have a script to pull data from our SCM servers. ... |
This is a fairly well-documented error and the fix is easy, but does anyone know why Hadoop datanode NamespaceIDs can get screwed up so easily or how Hadoop assigns the NamespaceIDs ... |
Is there a way to change a valid and existing Hadoop Path object into a useful Java File object. Is there a nice way of doing this or do I need ... |
I'm trying to use JIT compilation in clojure to generate mapper and reducer classes on the fly. However, these classes aren't being recognized by the JobClient (it's the usual ClassNotFoundException.)
If I ... |
In the code below, what does Iterator<V> and OutputCollector<K, V> mean? Is it a special data type?
public void reduce(K key,
Iterator<V> values,
OutputCollector<K, V> output,
...
|
Sorry for my poor english. i hope you'll understand my problem.
I have a question about hadoop developpment.
I have to train myself on a simple image processing project using hadoop.
All i want ... |
I am brand new to Linux, Java, and Hadoop. I have a created a simple MapReduce Driver that implements the Tool interface. But when I try to run the ... |
I am following the book Hadoop: the definitive Guide.
I am confused on example 3-1.
There is a Java source file, URLCat.java.
I use javac to compile it into URLCat.class, then ... |
I would like to start working with parsing large numbers of raw HTML pages into semantic data structures.
Just interested in the community opinion on various available tools for such a task, ... |
I have to make a project Distributed rendering of a 3d image. I can use standard algorithms. The aim is to learn hadoop and not image processing. So can any one ... |
I have launched a small cluster of two nodes and noticed that the master stays completely idle while the slave does all the work. I was wondering what is the way ... |
How do I create a hadoop jar that includes all dependencies in the lib folder using Gradle? Basically, similar to what fatjar does.
|
I have an Oracle database (roughly 1.2 billion records) of data with a web application sitting on top of it that generates queries (generates SQL code and returns counts). Basically you ... |
I have set up Hadoop on my laptop and ran the example program given in the installation guide successfully. But, I am not able to run a program.
rohit@renaissance1:~/hadoop/ch2$ hadoop ...
|
I just started with Hadoop. I wrote a sample hadoop code as was written in the book. But still, during the time of execution exceptions arise. The snippet of what I ... |
I am having a very large string, and when I read it in Java, I am getting out of memory error. Actually, I need to read all this string into memory ... |
We have a box that has terabytes of data (10-20TB) each day, where each file on the drive is anywhere from megabytes to gigabytes.
We want to send all these files to ... |
In the Hadoop API documentation it's given
that
setJarByClass
public void setJarByClass(Class cls)
Set the Jar by finding where a given class came from.
What exactly does this explanation ... |
Hey guys,
I want to allow people to put in simple text search terms, run a pig job(if that's best? it's what I know best) and output the results (the tsv file ... |
Every thing run well in Standalone mode and when going to the pseudo-distributed mode, the HDFS works well, I can put files to HDFS and browse it. And I also checked ... |
'Sizzle is an open source implementation of the Sawzall programming language designed for interoperation with the Hadoop MapReduce and DFS stack.' https://github.com/anthonyu/Sizzle
|
I want to open/create a file and write some data in it in hadoop environment. The distributed file system I am using is hdfs.
I want to do it in pseudo ... |
I am trying to train a Naive Bayes classifier with positive/negative words extracting from a sentiment. example:
I love this movie :))
I hate when it rains :( ... |
I haven't found an answer to this even after a bit of googling. My input files are generated by a process which chunks them out at say, when the file touches ... |
How to create the hadoop-0.21.0-core.jar using the source code?
I have check out the source code from svn. Now I have three dirs common,hdfs,mapred
I want to build the hadoop-0.21.0-core.jar to run a ... |
First of all, I am a newbie of Hadoop.
I have a small Hadoop pipes program that throws java.io.EOFException. The program takes
as input a small text file and uses hadoop.pipes.java.recordreader ... |
I want to use multiple files (actually 2 files) as a input files.
they are having same patterns of data.
finally, I wanna get to diff datas from two input files.
for example,
in a ... |
want to understand if Netezza/Hadoop is the right choice for the below purposes:
pull feed files from several online sources of considerable size at times more than a GB.
clean, filter, transform and ... |
Correct me if I'm wrong, but my understanding is that Hadoop does not use MPI for communication between different nodes.
What are the technical reasons for this?
I could hazard a few guesses, ... |
I have a situation where I have multiple (100+ of 2-3 MB each) files in compressed gz format present in multiple directories. For Example
A1/B1/C1/part-0000.gz
A2/B2/C2/part-0000.gz
A1/B1/C1/part-0001.gz
I have to feed ...
|
I want to do a project Hadoop and map reduce and present it as my graduation project. To this, I've given some thought,searched over the internet and came up with the ... |
Is there a column store similar to Vertica that is built on top of Hadoop.. I am not talking about HBase as it is sparse matrix store and can not get ... |
Has anyone here attended the Cloudera training and certification? How was the certification exam? Anything that would make the exam easy?
|
I downloaded the hadoop eclipe plug-in from this website:
https://issues.apache.org/jira/browse/MAPREDUCE-1262
Thus, I can run hadoop program inside eclipe, but I don't know how to use argument in this plugin.
For example
jar ... |
Is that possible to run Hadoop on Nginx?
if so, is there any reference?
|
am trying to set up apache hadoop in my system. In the procedure page it says "edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your ... |
I'm new to hadoop and trying to get my first non-trivial program working, and want to view standard out for debugging purposes. It's my understanding that standard out is directed into ... |
I'm new to hadoop and trying to create a file in HDFS from within the mapper of a map-reduce job.
The following code produces a NullPointerException in the last line:
DistributedFileSystem dfs = ...
|
I'm trying to create or open a file to store some output in HDFS, but I'm getting a NullPointerException when I call the exists method in the second to last line ... |
There are two arguments, a URI and a Configuration. I assume that the JobConf object that the client is set to should work for Configuration, but what about the URI?
Here is ... |
I want to create a file in HDFS that has a bunch of lines, each generated by a different call to map. I don't care about the order of the lines, ... |
I've set up and am testing out a pseudo-distributed Hadoop cluster (with namenode, job tracker, and task tracker/data node all on the same machine). The box I'm running on has about ... |
I have data files arranged in folders named as dates. Directory structure
- /data/2011/01/01
- /data/2011/01/02
and so on and inside each directory there are around 50 files I need to parsed and I am ... |
I have a simple Hadoop Job that I sucessfully compiled and ran on Hadoop 0.20.2. Now I am compiling against Hadoop 0.21.0 which works fine but trying to run it yields ... |
Hallo i am mohamad a student in masters degree
I want to ask a question about Zookeeper.
I read that the write operation in zookeeper to be done first the server connected ... |
I am an intermediate Java learner .I want to contribute to Apache Development,I saw there is a list of Apache Projects(like Hadoop,Derby etc),I have developed certain queries which I would like ... |