join « hadoop « Java Database Q&A

1. Life without JOINs... understanding, and common practices

Lots of "BAW"s (big ass-websites) are using data storage and retrieval techniques that rely on huge tables with indexes, and using queries that won't/can't use JOINs in their queries (BigTable, HQL, ...

2. Hadoop: intervals and JOIN

I'm very new to Hadoop and I'm currently trying to join two sources of data where the key is an interval (say [date-begin/date-end]). For example: input1:

20091001-20091002    A
20091011-20091104   ...

3. Hadoop's Map-side join implements Hash join?

I try to implement Hash join in Hadoop. However, Hadoop seems to have already a map-side join and a reduce - side join already implemented. What is the difference between these techniques and ...

4. Is Hadoop a good open-source project to join?

I've been learning Java for the last 2 months with a Core Java book. Now I want to write something real, but at first I decided that I need to improve ...

5. Similarity join using Hadoop

I'm new to hadoop. I'd like to run some approaches with you that I came up with. Problem:
2 datasets : A and B.
Both datasets represent songs: some top level ...

6. How would you suggest performing "Join" with Hadoop streaming?

I have two files, in the following formats: field1, field2, field3 field4, field1, field5 where different field number indicates different meaning. I want to join the two files using Hadoop Streaming based on the mutual ...

7. Understanding SQL joins within WHERE clause

I have a query in SQL that I'm trying to translate into Pig Latin (for use on a Hadoop cluster). Most of the time I have no problem moving the ...

8. Combine MapReduce result with data

How could i combine with map/reduce these two files: File1. Data.

1   name:foo1,position:bar1
2   name:foo2,position:bar2
3   name:foo3,position:bar3
4   name:foo4,position:bar4
5   name:foo5,position:bar5
File2. MR computed result.
1   1,2
3 ...

9. Implementing cross join in hadoop

I am trying to implement cross join using hadoop in java. Both sides of the join are large enough that I can't keep any of them in memory. I have tried ...

10. Is a collocated join (a-la-netezza) theoretically possible in hive?

When you join tables which are distributed on the same key and used these key columns in the join condition, then each SPU (machine) in netezza works 100% independent of the ...

11. Join vs COGROUP in PIG

Are there any advantages (wrt performance / no of map reduces ) when i use COGROUP instead of JOIN in pig ? talks about the difference in ...

12. How can I do this inner join properly in Apache PIG?

I have two files, one called a-records

and the other file called b-records
you can see in file A that I have the token 123 one time. In file B it's in there ...

13. How to do outer join on two columns in Pig Latin

I do outer joins on single columns in Pig like this

result = JOIN A by id LEFT OUTER, B by id;
How do I join on two columns, something like -