List of usage examples for org.apache.hadoop.mapred KeyValueTextInputFormat subclass-usage
From source file StreamWikiDumpInputFormat.java
public class StreamWikiDumpInputFormat extends KeyValueTextInputFormat { private static final String KEY_EXCLUDE_PAGE_PATTERN = "org.wikimedia.wikihadoop.excludePagesWith"; private static final String KEY_PREVIOUS_REVISION = "org.wikimedia.wikihadoop.previousRevision"; private static final String KEY_SKIP_FACTOR = "org.wikimedia.wikihadoop.skipFactor"; private CompressionCodecFactory compressionCodecs = null;
From source file org.wikimedia.wikihadoop.StreamWikiDumpInputFormat.java
/** A InputFormat implementation that splits a Wikimedia Dump File into page fragments, and emits them as input records.
* The record reader embedded in this input format converts a page into a sequence of page-like elements, each of which contains two consecutive revisions. Output is given as keys with empty values.
*
* For example, Given the following input containing two pages and four revisions,
* <pre><code>
* <page>
From source file wiki.hadoop.mapred.lib.input.StreamWikiDumpInputFormat.java
/** A InputFormat implementation that splits a Wikimedia Dump File into page fragments, and emits them as input records.
* The record reader embedded in this input format converts a page into a sequence of page-like elements, each of which contains two consecutive revisions. Output is given as keys with empty values.
*
* For example, Given the following input containing two pages and four revisions,
* <pre><code>
* <page>