com.intel.hadoop.graphbuilder.preprocess.inputformat
Class XMLInputFormat

java.lang.Object
  extended by org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
      extended by org.apache.hadoop.mapred.TextInputFormat
          extended by com.intel.hadoop.graphbuilder.preprocess.inputformat.XMLInputFormat
All Implemented Interfaces:
org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>, org.apache.hadoop.mapred.JobConfigurable

public class XMLInputFormat
extends org.apache.hadoop.mapred.TextInputFormat

Builtin InputFormat for XML, borrowed from Cloud9: ://github.com/lintool /Cloud9/blob/master/src/dist/edu/umd/cloud9/collection/XMLInputFormat.java. The class recognizes begin-of-document and end-of-document tags only: everything between those delimiting tags is returned in an uninterpreted Text object.


Nested Class Summary
static class XMLInputFormat.XMLRecordReader
          RecordReader for XML documents Recognizes begin-of-document and end-of-document tags only: Returning text object of everything in between delimiters
 
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
org.apache.hadoop.mapred.FileInputFormat.Counter
 
Field Summary
static java.lang.String END_TAG_KEY
          Define end tag of a complete input entry.
static java.lang.String START_TAG_KEY
          Define start tag of a complete input entry.
 
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
LOG
 
Constructor Summary
XMLInputFormat()
           
 
Method Summary
 void configure(org.apache.hadoop.mapred.JobConf jobConf)
           
 org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit, org.apache.hadoop.mapred.JobConf jobConf, org.apache.hadoop.mapred.Reporter reporter)
           
 
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPaths, getInputPathFilter, getInputPaths, getSplits, setInputPathFilter, setInputPaths, setInputPaths
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

START_TAG_KEY

public static final java.lang.String START_TAG_KEY
Define start tag of a complete input entry.

See Also:
Constant Field Values

END_TAG_KEY

public static final java.lang.String END_TAG_KEY
Define end tag of a complete input entry.

See Also:
Constant Field Values
Constructor Detail

XMLInputFormat

public XMLInputFormat()
Method Detail

configure

public void configure(org.apache.hadoop.mapred.JobConf jobConf)
Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable
Overrides:
configure in class org.apache.hadoop.mapred.TextInputFormat

getRecordReader

public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit,
                                                                                                                          org.apache.hadoop.mapred.JobConf jobConf,
                                                                                                                          org.apache.hadoop.mapred.Reporter reporter)
                                                                                                                   throws java.io.IOException
Specified by:
getRecordReader in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Overrides:
getRecordReader in class org.apache.hadoop.mapred.TextInputFormat
Throws:
java.io.IOException