com.intel.hadoop.graphbuilder.idnormalize.mapreduce
Class SortDictMR
java.lang.Object
com.intel.hadoop.graphbuilder.idnormalize.mapreduce.SortDictMR
public class SortDictMR
- extends java.lang.Object
This MapReduce class partitions the dictionary output of HashIdMR based on
the hash of the rawId, the key. It can also be used to partition the
dictionary based on the hash of the newId, the value, for reverse lookup.
Input directory: list of rawid vid pair. Output directory: $outputdir/
Method Summary |
void |
run(java.lang.String inputpath,
java.lang.String outputpath)
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SortDictMR
public SortDictMR(int numChunks,
boolean hashRawVid,
FieldParser vidparser)
- Parameters:
numChunks
- number of partitions of the partitioned dictionary.hashRawVid
- if true, it will partition based on hash(rawId); partition by
hash(newId) otherwise.vidparser
- FieldParser
for rawId.
run
public void run(java.lang.String inputpath,
java.lang.String outputpath)
throws java.io.IOException
- Parameters:
inputpath
- the path to a rawId to newId dictionary.outputpath
- the path of output directory.
- Throws:
java.io.IOException