Assume I have the following input in Pig:
some
And I would like to convert that into:
s
so
som
some
I've not (yet) found a way to iterate over a chararray in pig latin. I have found ... |
Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader:
REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar;
DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader();
log = LOAD '/data/logs' USING SequenceFileLoader AS (...)
Is there also a library out there that would ... |
I have a User Defined Function (UDF) written in Java to parse lines in a log file and return information back to pig, so it can do all the processing.
It looks ... |
I have a Pig program where I am trying to compute the minimum center between two bags. In order for it to work, I found I need to COGROUP the ... |
We are using Pig 0.6 to process some data. One of the columns of our data is a space-separated list of ids (such as: 35 521 225). We are ... |
My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS.
I understand ... |
I have the following scenario-
Pig version used 0.70
Sample HDFS directory structure:
/user/training/test/20100810/<data files>
/user/training/test/20100811/<data files>
/user/training/test/20100812/<data files>
/user/training/test/20100813/<data files>
/user/training/test/20100814/<data files>
As you can see in the paths listed above, one of the directory names is a ... |
|
Is there a way to do this? eg, pass the name of the file to be processed, etc?
|
I read in a csv-file that contains fields with numbers like that: "3".
Can I convert this fields from "3" to 3 with PigLatin? I need it to use the SUM() - ... |
I'm working on a JsonStorage for Pig. Everything works fine, but at least I need to get the names of the fields (i.e. crdate, name, positions) from the pig schema.
| A ...
|
I'm using PIG latin for log processing because its expressiveness in a problem where the data is not big enough to worry about setting up a whole hadoop cluster. I'm running ... |
Is there a way to export the results from Pig directly to a database like mysql?
|
I have a set set of records that I am loading from a file and the first thing I need to do is get the max and min of a column. ... |
I am trying to parse tab separated data files generated by our services using Amazon's Elastic Map Reduce via a Pig program. Things are going well except that all of our ... |
hey all I followed the steps here: http://wiki.apache.org/pig/PiggyBank
to build the piggybank jar but I keep getting the output below. I also built the pig project from source and reference ... |
As I've noted previously, Pig doesn't cope well with empty (0-byte) files. Unfortunately, there are lots of ways that these files can be created (even within Hadoop utilitities).
I thought ... |
I have a Pig job which analyzes log files and write summary output to S3. Instead of writing the output to S3, I want to convert it to a JSON payload ... |
I have a mysqldump of the format:
INSERT INTO `MY_TABLE` VALUES (893024968,'342903068923468','o03gj8ip234qgj9u23q59u','testing123','HTTP','1','4213883b49b74d3eb9bd57b7','blahblash','2011-04-19 00:00:00','448','206',NULL,'GG');
How do I load this data using pig? I have tried;
A = LOAD 'pig-test/test.log' USING PigStorage(',') AS (ID: chararray, ...
|
I have very little knowledge of pig. I have protobuf format data file. I need to load this file into a pig script. I need to write a LoadFunc UDF to ... |
I need to "transpose" data that looks like this:
id City
111 Chicago
111 New York ...
|
Assuming I have lines of data like the following that show user names and their favorite fruits:
Alice\tApple
Bob\tApple
Charlie\tGuava
Alice\tOrange
I'd like to create a pig query that shows the favorite fruit of each user. ... |
I have a file in hdfs with 100 columns, which i want to proces using pig. I want to load this file into a tuple with columns names in a ... |
I am very new to PIG and I am having what feels like a very basic problem.
I have a line of code that reads:
A = load 'Sites/trial_clustering/shortdocs/*'
...
|
I have a file of format <"id_1","id_2","id_3","id_4">. The file is stored in CSV format. I am able to read each field as "chararray". But, I want to read them as int, ... |