Example usage for org.apache.hadoop.mapred KeyValueTextInputFormat subclass-usage

Introduction

In this page you can find the example usage for org.apache.hadoop.mapred KeyValueTextInputFormat subclass-usage.

Usage

From source file StreamWikiDumpInputFormat.java

public class StreamWikiDumpInputFormat extends KeyValueTextInputFormat {

    private static final String KEY_EXCLUDE_PAGE_PATTERN = "org.wikimedia.wikihadoop.excludePagesWith";
    private static final String KEY_PREVIOUS_REVISION = "org.wikimedia.wikihadoop.previousRevision";
    private static final String KEY_SKIP_FACTOR = "org.wikimedia.wikihadoop.skipFactor";
    private CompressionCodecFactory compressionCodecs = null;

From source file org.wikimedia.wikihadoop.StreamWikiDumpInputFormat.java

/** A InputFormat implementation that splits a Wikimedia Dump File into page fragments, and emits them as input records.
 * The record reader embedded in this input format converts a page into a sequence of page-like elements, each of which contains two consecutive revisions.  Output is given as keys with empty values.
 *
 * For example,  Given the following input containing two pages and four revisions,
 * <pre><code>
 *  &lt;page&gt;

From source file wiki.hadoop.mapred.lib.input.StreamWikiDumpInputFormat.java

/** A InputFormat implementation that splits a Wikimedia Dump File into page fragments, and emits them as input records.
 * The record reader embedded in this input format converts a page into a sequence of page-like elements, each of which contains two consecutive revisions.  Output is given as keys with empty values.
 *
 * For example,  Given the following input containing two pages and four revisions,
 * <pre><code>
 *  &lt;page&gt;