Logs Tcpdumps are binary files, i wanna know what FileInputFormat of hadoop i should use for split chunks the input data...please help me!!
|
I'm trying to combine multiple files in multiple input directories into a single file, for various odd reasons I won't go into. My initial try was to write a 'nul' ... |
I've been trying to use Hadoop to send N amount of lines to a single mapping. I don't require for the lines to be split already.
I've tried to use NLineInputFormat, ... |
I have a rather simple hadoop question which I'll try to present with an example
say you have a list of strings and a large file and you want each mapper to ... |
i have algorithm that will go through a large data set read some text files and search for specific terms in those lines. I have it implemented in Java, but I ... |
Typically in a the input file is capable of being partially read and processed by Mapper function (as in text files). Is there anything that can be done to handle binaries ... |
I am running Hadoop 0.20.1 under SLES 10 (SUSE).
My Map task takes a file and generates a few more, I then generate my results from these files. I would like to ... |
|
I am using hadoop and working with a map task that creates files that I want to keep, currently I am passing these files through the collector to the reduce task. ... |
I have run into a complex problem with Mapreduce. I am trying to match up 2 unique values that are not always present together in the same line. Once ... |
I am doing some text processing using hadoop map-reduce jobs. My job is 99.2% complete and stuck on last map job.
The last few lines of the map output show as ... |
I'd like to use Apache Pig to build a large key -> value mapping, look things up in the map, and iterate over the keys. However, there does not even ... |
Suppose I have a plain text file with the following data:
DataSetOne <br />
content <br />
content <br />
content <br />
DataSetTwo <br />
content <br />
content <br />
content <br />
content <br />
...and so on...
What ... |
I want to do a hadoop job by mapping inputs which is from a file and a cassandra at a time.
it it possible?
I know the ways to get file inputs files ... |
Problem
Following up on this question, it seems that a file- or disk-based Map implementation may be the right solution to the problems I mentioned there. Short version:
|
I have two data sets one is historical quote data and other is historical trade data. Data is splitted per symbol per day basis. My question is how to load two ... |
I want to share large in memory static data(RAM lucene index) for my map tasks in Hadoop? Is there way for several map/reduce tasks to share same JVM?
|
I have a requirement that my mapper may in some cases produce a new key/value for another mapper to handle. Is there a sane way to do this? I've ... |
I'm currently writing distributed application which parses Pdf files with the help of Hadoop MapReduce. Input to MapReduce job is thousands of Pdf files (which mostly range from 100KB to ~2MB), ... |
I have a Hadoop streaming setup that works, however there is a bit of overhead when initializing the mappers which is done once per file, and since I am processing many ... |
I am trying to find out the progress rate of the map tasks. If someone can help me out it will be great !! Thanks !!
|
$hdfs dfs -rmr crawl
11/04/16 08:49:33 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
I'm using hadoop-0.21.0 with the default Single Node Setup configuration.
|
I am creating a program to analyze PDF, DOC and DOCX files. These files are stored in HDFS.
When I start my MapReduce job, I want the map function to have the ... |
I have a file, which contains IP packet headers in text format.
After the map function, each reduce method is called for a particular IP address. I want the values in a ... |
i want to sort my values before passing them to reduce function , i came to know that it can be achieved by setting outputkeycomparatorclass as given below
conf.setOutputKeyComparatorClass(SortReducerByValuesKeyComparator.class);
and my class is ... |
I need to do a MapReduce application in Java, that need to be auto-recursive, that means for each line of input file processed it must check all the lines of the ... |
i want to implement hive+hadoop map reduce program on my aplication,
i still wondering,because i have try many times about query and finding information about map reduce program in hive..
my question is,is ... |
I have a question about configuring Map/Side inner join for multiple mappers in Hadoop.
Suppose I have two very large data sets A and B, I use the same partition and ... |
I just want to ask you if there is away of using a reducer or something like concatenation to glue my outputs from the mapper and outputs
them as a single file ... |
By using java Runtime.getRuntime().exec(command); I want to run a program on a hadoop datanode as part of the map function. This program will create mp4 files on the datanode's local filesystem. ... |
so here is an example:
Is it possible to have same mapper run against multiple reducers at the same time? like
map output : {1:[1,2,3,4,5,4,3,2], 4:[5,4,6,7,8,9,5,3,3,2], 3:[1,5,4,3,5,6,7,8,9,1], so on} ...
|
I am trying to profile which functions consume the most time for a TeraSort Hadoop job. for my test system, I am using a basic 1-node pseudo-distributed setup. This means that ... |
I use Hadoop-Hive to analyse apache log to statis access features. I write a UDF named GetCity to convert the remote_ip to city name, but when I run "select GetCity(remote_ip) from ... |
I am trying to implement a MapReduce job, where each of the mappers would take 150 lines of the text file, and all the mappers would run simmultaniously; also, it should ... |
What is the purpose of the org.apache.hadoop.mapreduce.Mapper.run() function in Hadoop? The setup() is called before calling the map() and the clean() is called after the map(). The documentation for the run() ... |
I'm trying to set the number of map tasks to run in hadoop 0.20 environment.
I am using the old api.
Here are the options I've tried so far:
conf.set("mapred.tasktracker.map.tasks.maximum", ...
|
This is my first time using map/reduce. I want to write a program that processes a large log file. For example, if I was processing a log file that had records ... |
I added the following in my conf/mapred-site.xml
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
</property>
But when I run the job, its still runs 2 maps(which is default one)? ... |
I am writing a small hadoop program in java, my requirement is to do two Emits from a single Map method and handle both the Emits in a single Reduce method. ... |
Any idea how can I set Map object into org.apache.hadoop.conf.Configuration?
|
I have a Class something like this in java for hadoop MapReduce
public Class MyClass {
public static MyClassMapper extends Mapper {
...
|
I create 2 tables with the same format
CREATE TABLE info(mymap MAP)
and
CREATE TABLE info_1(mymap MAP)
now i managed to load some data into info, and wanna to make info_1 as a dup ... |
I've recently started looking into the MapReduce/Hadoop framework and am wondering if my problem is truly lends itself to the framework.
Consider this. Consider an example where I have a large set ... |
I have gone thru few hadoop info books and papers.
A Slot is a map/reduce computation unit at a node. it may be map or reduce slot.
As far as, i know split ... |
I have a sequential file which is the output of hadoop map-reduce job.
In this file data is written in key value pairs ,and value itself is a map.
I want to read ... |
Hi Chuck Lam, Suppose I have some data and I want process it iteratively grouping for a different key. I think this could be done by running some Hadoop Tasks, but each would have an initial load, that is the initial I/O and the mapping process. My idea was a map once and then do several reduces. Those reduces would emit ... |