com.intel.hadoop.graphbuilder.preprocess.mapreduce
Class CreateGraphMR

java.lang.Object
  extended by com.intel.hadoop.graphbuilder.preprocess.mapreduce.CreateGraphMR

public class CreateGraphMR
extends java.lang.Object

This MapReduce Job creates an initial edge list and vertex list from raw input data, e.g. text xml. The result graph does not contain self edge and duplicate vertex/edges.

The Mapper class parse each input value, provided by the InputFormat, and output a list of Vertex and a list of Edge using a GraphTokenizer.

The Reducer class applies user defined Functionals to reduce duplicate edges and vertices. If no such Functional is provide, it outputs the first instance and discards the rest with the same identifier. It also discards self edges: v - > v. An option for discard bidirectional edge is provided by cleanBidirectionalEdge(boolean).

Input directory: Can take multiple input directories. Output directory structure:

See Also:
GraphTokenizer

Constructor Summary
CreateGraphMR(GraphTokenizer tokenizer, org.apache.hadoop.mapred.InputFormat inputformat)
          Create a Job and set tokenizer and inputformat.
 
Method Summary
 void cleanBidirectionalEdge(boolean clean)
          Set the option to clean bidirectional edges.
 org.apache.hadoop.mapred.JobConf getConf()
           
 void run(java.lang.String[] inputpaths, java.lang.String outputpath)
           
 void setFunctionClass(java.lang.Class vertexfunc, java.lang.Class edgefunc)
          Set user defined function for reduce duplicate vertex and edges.
 void setUserOptions(java.util.HashMap<java.lang.String,java.lang.String> userOpts)
          Set the user defined options.
 void setValueClass(java.lang.Class valClass)
          Set the intermediate key value class.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

CreateGraphMR

public CreateGraphMR(GraphTokenizer tokenizer,
                     org.apache.hadoop.mapred.InputFormat inputformat)
Create a Job and set tokenizer and inputformat.

Parameters:
tokenizer -
inputformat -
Method Detail

setFunctionClass

public void setFunctionClass(java.lang.Class vertexfunc,
                             java.lang.Class edgefunc)
Set user defined function for reduce duplicate vertex and edges.

Parameters:
vertexfunc -
edgefunc -

cleanBidirectionalEdge

public void cleanBidirectionalEdge(boolean clean)
Set the option to clean bidirectional edges.

Parameters:
clean - the boolean option value, if true then clean bidirectional edges.

setValueClass

public void setValueClass(java.lang.Class valClass)
Set the intermediate key value class.

Parameters:
valClass -

getConf

public org.apache.hadoop.mapred.JobConf getConf()
Returns:
JobConf of the current job.

setUserOptions

public void setUserOptions(java.util.HashMap<java.lang.String,java.lang.String> userOpts)
Set the user defined options.

Parameters:
userOpts - a Map of option key value pairs.

run

public void run(java.lang.String[] inputpaths,
                java.lang.String outputpath)
         throws java.lang.Exception
Throws:
java.lang.Exception