pig « hadoop « Java Database Q&A

1. Hadoop pig latin style guide?    stackoverflow.com

I'm looking to take the short cut on formatting/style for pig latin (hadoop-ay). Does anyone know where I can find a style guide? -daniel

2. Examples of simple stats calculation with hadoop    stackoverflow.com

I want to extend an existing clustering algorithm to cope with very large data sets and have redesigned it in such a way that it is now computable with partitions of ...

3. Regexp matching in pig    stackoverflow.com

Using apache pig and the text

hahahah.  my brother just didnt do anything wrong. He cheated on a test? no way!
I'm trying to match "my brother just didnt do anything wrong." Ideally, ...

4. generating bigram combinations from grouped data in pig    stackoverflow.com

given my input data in userid,itemid format:

raw: {userid: bytearray,itemid: bytearray}

dump raw;

grpd = GROUP raw BY userid;

dump grpd;

I'd like to generate all of the combinations(order not important) of items within each group. ...

5. Is there a canonical problem that provably can't be aided with map/reduce?    stackoverflow.com

I'm trying to understand the boundaries of hadoop and map/reduce and it would help to know a non-trivial problem, or class of problems, that we know map/reduce can't assist in. It certainly ...

6. Merging multiple files into one within Hadoop    stackoverflow.com

I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way ...

7. What are the environment settings in Apache Pig and Hadoop Connection to run tutorial scripts?    stackoverflow.com

I am trying to run the pig tutorial scripts in Ubuntu for two days, however I can not manage to make pig connect to hadoop file system. It is still saying: ...

8. pig hadoop needed for I want to do?    stackoverflow.com

I have a question for you, well a clarification... I developed a program that uses hadoop map reduce wich gets just a column from a dataset (csv file) and process this data ...

9. Pig Version Mismatch (Hadoop)    stackoverflow.com

Did anyone has met the problem before? This is error log: Protocol org.apache.hadoop.mapred.JobSubmissionProtocol version mismatch. (client = 20, server = 21) I used pig 0.8.0 and my hadoop version is 0.20.10. I appreciate if ...

10. How to "update" a column using pig latin    stackoverflow.com

Imagine I have the following table available to me:

A: { x: int, y: int, z: int, ...99 other columns... }
I now want to transform this, such that z is set to ...

11. If I have a constructor that requires a path to a file, how can I "fake" that if it is packaged into a jar?    stackoverflow.com

The context of this question is that I am trying to use the maxmind java api in a pig script that I have written... I do not think that knowing about ...

12. Can PIG and HIVE be called separate programming models?    stackoverflow.com

This question might sound irritating, and may not actually have anything to do with real programming. It's a spin-off of a small debate i had with a colleague of mine. He ...

13. How do I read static files in a PIG UDF    stackoverflow.com

I am new to PIG and Hadoop. I have written a PIG UDF which operates on String and returns a string. I actually use a class from an already existing jar ...

14. Hadoop Hypercube    stackoverflow.com

Hey, i am starting a hadoop based hypercube with a flexible number of dimensions. Does anybody know any existing approaches for this? I just found PigOLAPSketch, but there is no code to ...

15. Generate multiple outputs with Hadoop Pig    stackoverflow.com

I've got this file containing a list of data in Hadoop. I've build a simple Pig script which analyze the file by the id number, and so on... The last step I'm ...

16. Hadoop PIG ouput is not split in mutliple files with PARALLEL operator    stackoverflow.com

Looks like that I'm missing something. Number of reducer on my data although creates that many number of files in HDFS but my data is not split into multiple files. What ...

17. Doing analytical queries on large dynamic sets of data    stackoverflow.com

I have a requirement where I have large sets of incoming data into a system I own. A single unit of data in this set has a set of immutable attributes ...

18. How to generate a custom schema from a relation in Pig?    stackoverflow.com

I have a schema describing tf-idf values for words in various articles. Its description looks like:

tfidf_relation: {word: chararray,id: bytearray,tfidf: double}
Here is an example of such data:
I want to get output in a ...

19. How do you deal with empty or missing input files in Apache Pig?    stackoverflow.com

Our workflow uses an AWS elastic map reduce cluster to run series of Pig jobs to manipulate a large amount of data into aggregated reports. Unfortunately, the input data is potentially ...

20. Running Pig query over data stored in Hive    stackoverflow.com

I would like to know how to run Pig queries stored in Hive format. I have configured Hive to store compressed data (using this tutorial http://wiki.apache.org/hadoop/Hive/CompressedStorage). Before that ...

21. Skip a record in LoadFunc.getNext()    stackoverflow.com

I'm extending the LoadFunc. In the getNext function I'd like to skip returning a tuple under certain conditions - this way I could only load a sample of the data file. ...

22. loading an external properties file in udf    stackoverflow.com

When writing a UDF let's say a EvalFunc, is it possible to pass a configuration file with

properties = new Properties();
properties.load(new FileInputStream("conf/config.properties"));
when running in Hadoop Mode? Best, Will

23. Pig Loader for SQL queries?    stackoverflow.com

I'm looking for a Pig (related to Hadoop) loader to retrieve data from a SQL Server. If you've come across one, please let me know. Thanks. = Yakov

24. run pig on hadoop could not find the result    stackoverflow.com

I ran a pig script on a hadoop cluster, it pass successfully but i cannot find the result files, here is what it said:

Successfully stored 2 records (122 bytes) in: "hdfs://ocean-01/user/root/all_users"
i ...

25. Use Hadoop Pig to load data from text file w/ each record on multiple lines?    stackoverflow.com

I have my data file in the following format:

U:    john
T:    2011-03-03 12:12:12
L:    san diego, CA

U:    john
T:    ...

26. cant run pig with single node hadoop server    stackoverflow.com

I have setup a VM with ubuntu. It runs hadoop as a single node. Later I installed apache pig on it. apache pig runs great with local mode, but it always ...

27. What are some approaches to run multiple Pig scripts sequentially?    stackoverflow.com

I need to run some Pig scripts sequentially in Hadoop. They must be run separately. Any suggestions? update Just a quick update that we're working toward running the Pig scripts from ...

28. "Failed to create DataStorage" error when using Pig with Hadoop    stackoverflow.com

I've been trying to get Pig 0.9.0 to run using Apache Hadoop I've looked high and low over google and mailing lists and even this question: cant run pig ...

29. Apache Pig permissions issue    stackoverflow.com

I'm attempting to get Apache Pig up and running on my Hadoop cluster, and am encountering a permissions problem. Pig itself is launching and connecting to the cluster just fine- ...

30. How to Get Pig to Work with lzo Files?    stackoverflow.com

So, I've seen a couple of tutorials for this online, but each seems to say to do something different. Also, each of them doesn't seem to specify whether you're trying to ...

31. PIG : Filter a string on the basis of a word    stackoverflow.com

I have a pig job where in i need to filter the data by finding a word in it, Here is the snippet

A = LOAD '/home/user/filename' USING PigStorage(',');

32. Executing Pig on another framework    stackoverflow.com

I understand that Pig Latin is a data flow language. In that sense it should be theoretically possible to execute Pig Latin in any framework though currently and it is meant ...

33. Pig: Pulling individual fields out after a GROUP    stackoverflow.com

In PigLatin, I want to pull the other fields out of a record I want to select because of an aggregate, such as MAX. I'm having trouble explaining the problem, so here ...

34. getting started with pig    stackoverflow.com

This might be a really stupid question but I'm not able to install pig properly on my machine. Pig's version is 0.9.0. I have even set my JAVA_HOME to its designated path . I've ...

35. Java or Pig regex to strip out values from UserAgent string    stackoverflow.com

I need to strip out the third and subsequent values in the 'bracketed' component of the user agent string. In order to get

Mozilla/4.0 (compatible; MSIE 8.0)
Mozilla/4.0 (compatible; MSIE ...