com.intel.hadoop.graphbuilder.idnormalize.mapreduce
Class SortDictMR

java.lang.Object
  extended by com.intel.hadoop.graphbuilder.idnormalize.mapreduce.SortDictMR

public class SortDictMR
extends java.lang.Object

This MapReduce class partitions the dictionary output of HashIdMR based on the hash of the rawId, the key. It can also be used to partition the dictionary based on the hash of the newId, the value, for reverse lookup.

Input directory: list of rawid vid pair. Output directory: $outputdir/


Constructor Summary
SortDictMR(int numChunks, boolean hashRawVid, FieldParser vidparser)
           
 
Method Summary
 void run(java.lang.String inputpath, java.lang.String outputpath)
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SortDictMR

public SortDictMR(int numChunks,
                  boolean hashRawVid,
                  FieldParser vidparser)
Parameters:
numChunks - number of partitions of the partitioned dictionary.
hashRawVid - if true, it will partition based on hash(rawId); partition by hash(newId) otherwise.
vidparser - FieldParser for rawId.
Method Detail

run

public void run(java.lang.String inputpath,
                java.lang.String outputpath)
         throws java.io.IOException
Parameters:
inputpath - the path to a rawId to newId dictionary.
outputpath - the path of output directory.
Throws:
java.io.IOException