com.intel.hadoop.graphbuilder.preprocess.inputformat
Class XMLInputFormat
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
org.apache.hadoop.mapred.TextInputFormat
com.intel.hadoop.graphbuilder.preprocess.inputformat.XMLInputFormat
- All Implemented Interfaces:
- org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>, org.apache.hadoop.mapred.JobConfigurable
public class XMLInputFormat
- extends org.apache.hadoop.mapred.TextInputFormat
Builtin InputFormat for XML, borrowed from Cloud9: ://github.com/lintool
/Cloud9/blob/master/src/dist/edu/umd/cloud9/collection/XMLInputFormat.java
.
The class recognizes begin-of-document and end-of-document tags only:
everything between those delimiting tags is returned in an uninterpreted Text
object.
Nested Class Summary |
static class |
XMLInputFormat.XMLRecordReader
RecordReader for XML documents Recognizes begin-of-document and
end-of-document tags only: Returning text object of everything in between
delimiters |
Nested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat |
org.apache.hadoop.mapred.FileInputFormat.Counter |
Field Summary |
static java.lang.String |
END_TAG_KEY
Define end tag of a complete input entry. |
static java.lang.String |
START_TAG_KEY
Define start tag of a complete input entry. |
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat |
LOG |
Method Summary |
void |
configure(org.apache.hadoop.mapred.JobConf jobConf)
|
org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> |
getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit,
org.apache.hadoop.mapred.JobConf jobConf,
org.apache.hadoop.mapred.Reporter reporter)
|
Methods inherited from class org.apache.hadoop.mapred.FileInputFormat |
addInputPath, addInputPaths, getInputPathFilter, getInputPaths, getSplits, setInputPathFilter, setInputPaths, setInputPaths |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
START_TAG_KEY
public static final java.lang.String START_TAG_KEY
- Define start tag of a complete input entry.
- See Also:
- Constant Field Values
END_TAG_KEY
public static final java.lang.String END_TAG_KEY
- Define end tag of a complete input entry.
- See Also:
- Constant Field Values
XMLInputFormat
public XMLInputFormat()
configure
public void configure(org.apache.hadoop.mapred.JobConf jobConf)
- Specified by:
configure
in interface org.apache.hadoop.mapred.JobConfigurable
- Overrides:
configure
in class org.apache.hadoop.mapred.TextInputFormat
getRecordReader
public org.apache.hadoop.mapred.RecordReader<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text> getRecordReader(org.apache.hadoop.mapred.InputSplit inputSplit,
org.apache.hadoop.mapred.JobConf jobConf,
org.apache.hadoop.mapred.Reporter reporter)
throws java.io.IOException
- Specified by:
getRecordReader
in interface org.apache.hadoop.mapred.InputFormat<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
- Overrides:
getRecordReader
in class org.apache.hadoop.mapred.TextInputFormat
- Throws:
java.io.IOException