com.ebay.erl.mobius.core.mapred
Class TSVMapper

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by com.ebay.erl.mobius.core.datajoin.DataJoinMapper<IK,IV,org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>>
          extended by com.ebay.erl.mobius.core.mapred.AbstractMobiusMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
              extended by com.ebay.erl.mobius.core.mapred.TSVMapper
All Implemented Interfaces:
java.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text,org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>>

public class TSVMapper
extends AbstractMobiusMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>

A mapper that parses the values (in Text format) into Tuple elements.

This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan


Field Summary
protected  java.lang.String delimiter
          delimiter of the underline file, it is tab by default, and can be changed using TSVDatasetBuilder.setDelimiter(String)
protected  java.lang.String[] schema
          schema of this TSV dataset
 
Fields inherited from class com.ebay.erl.mobius.core.mapred.AbstractMobiusMapper
_100MB, _COUNTER_FILTERED_RECORD, _COUNTER_INPUT_RECORD, _COUNTER_INVALIDATE_FORMAT_RECORD, _COUNTER_OUTPUT_RECORD, _IS_MAP_ONLY_JOB, computedColumns, counterThread, currentDatasetID, dataset_display_id, key_columns, projection_order, reporterSet, tuple_criteria, value_columns
 
Fields inherited from class com.ebay.erl.mobius.core.datajoin.DataJoinMapper
conf, hasReducer
 
Constructor Summary
TSVMapper()
           
 
Method Summary
 void configure(org.apache.hadoop.mapred.JobConf conf)
          Setup Mapper.
 Tuple parse(org.apache.hadoop.io.LongWritable inkey, org.apache.hadoop.io.Text invalue)
          Parse the invalue into Tuple with the schema given in TSVDatasetBuilder.
 
Methods inherited from class com.ebay.erl.mobius.core.mapred.AbstractMobiusMapper
close, get, getDatasetID, joinmap, outputRecords, updateCounter
 
Methods inherited from class com.ebay.erl.mobius.core.datajoin.DataJoinMapper
extractSortValueKeyword, getSortValueComparator, map
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

delimiter

protected java.lang.String delimiter
delimiter of the underline file, it is tab by default, and can be changed using TSVDatasetBuilder.setDelimiter(String)


schema

protected java.lang.String[] schema
schema of this TSV dataset

Constructor Detail

TSVMapper

public TSVMapper()
Method Detail

configure

public void configure(org.apache.hadoop.mapred.JobConf conf)
Setup Mapper.

Override this method if there is extra initial settings need to be done.

Make sure to call super.configure(JobConf) when overriding.

setup the delimiter and schema from Hadoop configuration.

delimiter is set in TSVDatasetBuilder.setDelimiter(String) and schema is set in TSVDatasetBuilder.newInstance(com.ebay.erl.mobius.core.MobiusJob, String, String[]), both are stored in Hadoop configuration and Mobius retrieve the values from here.

Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable
Overrides:
configure in class AbstractMobiusMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>

parse

public Tuple parse(org.apache.hadoop.io.LongWritable inkey,
                   org.apache.hadoop.io.Text invalue)
            throws java.util.IllegalFormatException,
                   java.io.IOException
Parse the invalue into Tuple with the schema given in TSVDatasetBuilder.

invalue is delimited by the delimiter, and stored in a Tuple all in java.lang.String type, then the schema will be assigned to the tuple.

If the length of the value array (the delimited result of invalue) is lesser than the length of schema, the null will be placed for those columns which don't have value.

If the length of the value array is longer that the schema, then "IDX_$i" will be used for those unnamed values, where $i start from the length of schema.

Specified by:
parse in class AbstractMobiusMapper<org.apache.hadoop.io.LongWritable,org.apache.hadoop.io.Text>
Throws:
java.util.IllegalFormatException
java.io.IOException