com.intel.hadoop.graphbuilder.preprocess.inputformat
Interface GraphTokenizer<VidType extends org.apache.hadoop.io.WritableComparable<VidType>,VertexData extends org.apache.hadoop.io.Writable,EdgeData extends org.apache.hadoop.io.Writable>

Type Parameters:
VidType -
VertexData -
EdgeData -
All Known Implementing Classes:
LinkGraphTokenizer, WordCountGraphTokenizer

public interface GraphTokenizer<VidType extends org.apache.hadoop.io.WritableComparable<VidType>,VertexData extends org.apache.hadoop.io.Writable,EdgeData extends org.apache.hadoop.io.Writable>

Tokenize the input provided by InputFormat into a list of Vertex and and a list of Edge objects. This should be the first step to implement along with the design of the InputFormat of the raw input.

See Also:
InputFormat

Method Summary
 void configure(org.apache.hadoop.mapred.JobConf job)
          Configure the tokenizer from JobConf.
 java.lang.Class edataClass()
           
 java.util.Iterator<Edge<VidType,EdgeData>> getEdges()
           
 java.util.Iterator<Vertex<VidType,VertexData>> getVertices()
           
 void parse(java.lang.String s)
          Parse the input string and filter into internal vertex and edge fields.
 java.lang.Class vdataClass()
           
 java.lang.Class vidClass()
           
 

Method Detail

configure

void configure(org.apache.hadoop.mapred.JobConf job)
Configure the tokenizer from JobConf.

Parameters:
job -

parse

void parse(java.lang.String s)
Parse the input string and filter into internal vertex and edge fields.

Parameters:
s -

getVertices

java.util.Iterator<Vertex<VidType,VertexData>> getVertices()
Returns:
a list of Vertex extracted from the input.

getEdges

java.util.Iterator<Edge<VidType,EdgeData>> getEdges()
Returns:
a list of Edge extracted from the input.

vidClass

java.lang.Class vidClass()
Returns:
Class of the VidType. Used for type safety in the high level.

vdataClass

java.lang.Class vdataClass()
Returns:
Class of the VertexData. Used for type safety in the high level.

edataClass

java.lang.Class edataClass()
Returns:
Class of the EdgeData. Used for type safety in the high level.