join « hadoop « Java Database Q&A

1. Life without JOINs... understanding, and common practices stackoverflow.com

Lots of "BAW"s (big ass-websites) are using data storage and retrieval techniques that rely on huge tables with indexes, and using queries that won't/can't use JOINs in their queries (BigTable, HQL, ...

2. Hadoop: intervals and JOIN stackoverflow.com

I'm very new to Hadoop and I'm currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example: input1:

20091001-20091002    A
20091011-20091104   ...

3. Hadoop's Map-side join implements Hash join? stackoverflow.com

I try to implement Hash join in Hadoop. However, Hadoop seems to have already a map-side join and a reduce - side join already implemented. What is the difference between these techniques and ...

4. Is Hadoop a good open-source project to join? stackoverflow.com

I've been learning Java for the last 2 months with a Core Java book. Now I want to write something real, but at first I decided that I need to improve ...

5. Similarity join using Hadoop stackoverflow.com

I'm new to hadoop. I'd like to run some approaches with you that I came up with. Problem:
2 datasets : A and B.
Both datasets represent songs: some top level ...

6. How would you suggest performing "Join" with Hadoop streaming? stackoverflow.com

I have two files, in the following formats: field1, field2, field3 field4, field1, field5 where different field number indicates different meaning. I want to join the two files using Hadoop Streaming based on the mutual ...

7. Understanding SQL joins within WHERE clause stackoverflow.com

I have a query in SQL that I'm trying to translate into Pig Latin (for use on a Hadoop cluster). Most of the time I have no problem moving the ...

8. Combine MapReduce result with data stackoverflow.com

How could i combine with map/reduce these two files: File1. Data.

1   name:foo1,position:bar1
2   name:foo2,position:bar2
3   name:foo3,position:bar3
4   name:foo4,position:bar4
5   name:foo5,position:bar5

File2. MR computed result.

1   1,2
3 ...

9. Implementing cross join in hadoop stackoverflow.com

I am trying to implement cross join using hadoop in java. Both sides of the join are large enough that I can't keep any of them in memory. I have tried ...

10. Is a collocated join (a-la-netezza) theoretically possible in hive? stackoverflow.com

When you join tables which are distributed on the same key and used these key columns in the join condition, then each SPU (machine) in netezza works 100% independent of the ...

11. Join vs COGROUP in PIG stackoverflow.com

Are there any advantages (wrt performance / no of map reduces ) when i use COGROUP instead of JOIN in pig ? http://developer.yahoo.com/hadoop/tutorial/module6.html talks about the difference in ...

12. How can I do this inner join properly in Apache PIG? stackoverflow.com

I have two files, one called a-records

123^record1
222^record2
333^record3

and the other file called b-records

123^jim
123^jim
222^mike
333^joe

you can see in file A that I have the token 123 one time. In file B it's in there ...

13. How to do outer join on two columns in Pig Latin stackoverflow.com

I do outer joins on single columns in Pig like this

result = JOIN A by id LEFT OUTER, B by id;

How do I join on two columns, something like -

WHERE A.id=B.id ...