piglatin « hadoop « Java Database Q&A

1. Splitting input into substrings in PIG (Hadoop)    stackoverflow.com

Assume I have the following input in Pig:

And I would like to convert that into:
I've not (yet) found a way to iterate over a chararray in pig latin. I have found ...

2. Storing data to SequenceFile from Apache Pig    stackoverflow.com

Apache Pig can load data from Hadoop sequence files using the PiggyBank SequenceFileLoader: REGISTER /home/hadoop/pig/contrib/piggybank/java/piggybank.jar; DEFINE SequenceFileLoader org.apache.pig.piggybank.storage.SequenceFileLoader(); log = LOAD '/data/logs' USING SequenceFileLoader AS (...) Is there also a library out there that would ...

3. Does throwing an exception in an EvalFunc pig UDF skip just that line, or stop completely?    stackoverflow.com

I have a User Defined Function (UDF) written in Java to parse lines in a log file and return information back to pig, so it can do all the processing. It looks ...

4. How can I load a file into a DataBag from within a Yahoo PigLatin UDF?    stackoverflow.com

I have a Pig program where I am trying to compute the minimum center between two bags. In order for it to work, I found I need to COGROUP the ...

5. Pass a relation to a PIG UDF when using FOREACH on another relation?    stackoverflow.com

We are using Pig 0.6 to process some data. One of the columns of our data is a space-separated list of ids (such as: 35 521 225). We are ...

6. Difference between Pig and Hive? Why have both?    stackoverflow.com

My background - 4 weeks old in the Hadoop world. Dabbled a bit in Hive, Pig and Hadoop using Cloudera's Hadoop VM. Have read Google's paper on Map-Reduce and GFS. I understand ...

7. Pig Latin: Load multiple files from a date range (part of the directory structure)    stackoverflow.com

I have the following scenario- Pig version used 0.70 Sample HDFS directory structure:

/user/training/test/20100810/<data files>
/user/training/test/20100811/<data files>
/user/training/test/20100812/<data files>
/user/training/test/20100813/<data files>
/user/training/test/20100814/<data files>
As you can see in the paths listed above, one of the directory names is a ...

8. Hadoop Pig: Passing Command Line Arguments    stackoverflow.com

Is there a way to do this? eg, pass the name of the file to be processed, etc?

9. Convert "3" to 3 with PigLatin    stackoverflow.com

I read in a csv-file that contains fields with numbers like that: "3". Can I convert this fields from "3" to 3 with PigLatin? I need it to use the SUM() - ...

10. Get names of field schema from Pig    stackoverflow.com

I'm working on a JsonStorage for Pig. Everything works fine, but at least I need to get the names of the fields (i.e. crdate, name, positions) from the pig schema.

| A ...

11. Fine tuning PIG for local execution    stackoverflow.com

I'm using PIG latin for log processing because its expressiveness in a problem where the data is not big enough to worry about setting up a whole hadoop cluster. I'm running ...

12. A way to export the results from Pig to a database    stackoverflow.com

Is there a way to export the results from Pig directly to a database like mysql?

13. Max/Min for whole sets of records in PIG    stackoverflow.com

I have a set set of records that I am loading from a file and the first thing I need to do is get the max and min of a column. ...

14. How do I trim a header row from files processed by Hadoop's Pig?    stackoverflow.com

I am trying to parse tab separated data files generated by our services using Amazon's Elastic Map Reduce via a Pig program. Things are going well except that all of our ...

15. Unable to build piggybank -> /home/build/ivy/lib does not exist    stackoverflow.com

hey all I followed the steps here: http://wiki.apache.org/pig/PiggyBank to build the piggybank jar but I keep getting the output below. I also built the pig project from source and reference ...

16. How does Pig use Hadoop Globs in a 'load' statement?    stackoverflow.com

As I've noted previously, Pig doesn't cope well with empty (0-byte) files. Unfortunately, there are lots of ways that these files can be created (even within Hadoop utilitities). I thought ...

17. POST Hadoop Pig output to a URL as JSON data?    stackoverflow.com

I have a Pig job which analyzes log files and write summary output to S3. Instead of writing the output to S3, I want to convert it to a JSON payload ...

18. Loading from mysqldump with PIG    stackoverflow.com

I have a mysqldump of the format:

INSERT INTO `MY_TABLE` VALUES (893024968,'342903068923468','o03gj8ip234qgj9u23q59u','testing123','HTTP','1','4213883b49b74d3eb9bd57b7','blahblash','2011-04-19 00:00:00','448','206',NULL,'GG');
How do I load this data using pig? I have tried;
A = LOAD 'pig-test/test.log' USING PigStorage(',') AS (ID: chararray, ...

19. Loading protobuf format file into pig script using loadfunc pig UDF    stackoverflow.com

I have very little knowledge of pig. I have protobuf format data file. I need to load this file into a pig script. I need to write a LoadFunc UDF to ...

20. Transpose data in Apache Pig Latin    stackoverflow.com

I need to "transpose" data that looks like this:

id      City   
111     Chicago  
111     New York ...

21. Is it possible to detect and handle string collisions among grouped values when grouping in Hadoop Pig?    stackoverflow.com

Assuming I have lines of data like the following that show user names and their favorite fruits:

I'd like to create a pig query that shows the favorite fruit of each user. ...

22. how to call a pig script within another pig script    stackoverflow.com

I have a file in hdfs with 100 columns, which i want to proces using pig. I want to load this file into a tuple with columns names in a ...

23. using PIG to load a file    stackoverflow.com

I am very new to PIG and I am having what feels like a very basic problem. I have a line of code that reads:

A = load 'Sites/trial_clustering/shortdocs/*'

24. Read and parse using Pig Latin    stackoverflow.com

I have a file of format <"id_1","id_2","id_3","id_4">. The file is stored in CSV format. I am able to read each field as "chararray". But, I want to read them as int, ...