com.ebay.erl.mobius.core.mapred
Class AbstractMobiusMapper<IK,IV>

java.lang.Object
  extended by org.apache.hadoop.mapred.MapReduceBase
      extended by com.ebay.erl.mobius.core.datajoin.DataJoinMapper<IK,IV,org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>>
          extended by com.ebay.erl.mobius.core.mapred.AbstractMobiusMapper<IK,IV>
Type Parameters:
IK - input key type.
IV - input value type.
All Implemented Interfaces:
java.io.Closeable, org.apache.hadoop.mapred.JobConfigurable, org.apache.hadoop.mapred.Mapper<IK,IV,org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>>
Direct Known Subclasses:
SequenceFileMapper, TSVMapper

public abstract class AbstractMobiusMapper<IK,IV>
extends DataJoinMapper<IK,IV,org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>>

Base class for implementing a customized Mobius mapper.

Extends this class if the built-in mappers, TSVMapper and SequenceFileMapper, does not meet the needs.

This class provides filtering (by taking user specified tuple_criteria), compute computedColumns, and updating counters.

Override the parse(Object, Object) method to convert the K-V objects into a tuple, then the underlying data source can be processed by mobius.

This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan


Field Summary
static long _100MB
           
protected  long _COUNTER_FILTERED_RECORD
          Counts for the number of filtered records, filtered by user specified tuple_criteria.
protected  long _COUNTER_INPUT_RECORD
          Counts for the number of input records.
protected  long _COUNTER_INVALIDATE_FORMAT_RECORD
          Counts for invalidate format records.
protected  long _COUNTER_OUTPUT_RECORD
          Counts for the number of outputted records.
protected  boolean _IS_MAP_ONLY_JOB
           
protected  java.util.List<ComputedColumns> computedColumns
          ComputedColumns specified by user.
protected  com.ebay.erl.mobius.core.mapred.CounterUpdateThread counterThread
          A background thread responsible for updating the Hadoop counters.
protected  java.lang.String currentDatasetID
          The current dataset ID.
protected  java.lang.String dataset_display_id
          The normalized name of the dataset been processed by this mapper currently, it is used as counter ID to update the corresponding Hadoop counters for this dataset.
protected  java.lang.String[] key_columns
          columns to be emitted as key of this Mapper
protected  java.lang.String[] projection_order
          Output column names for map only job, ex: listing.
protected  boolean reporterSet
           
protected  TupleCriterion tuple_criteria
          filters
protected  java.lang.String[] value_columns
          columns to be emitted as value of this Mapper
 
Fields inherited from class com.ebay.erl.mobius.core.datajoin.DataJoinMapper
conf, hasReducer
 
Constructor Summary
AbstractMobiusMapper()
           
 
Method Summary
 void close()
          close Mapper
 void configure(org.apache.hadoop.mapred.JobConf conf)
          Setup Mapper.
protected  java.lang.Object get(java.lang.String key)
          Get object from JobConf, assuming the value is Base64 encoded, and can be decoded back to Java object.
 java.lang.String getDatasetID()
          Get the current dataset ID.
 void joinmap(IK key, IV value, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>> output, org.apache.hadoop.mapred.Reporter reporter)
          map()
protected  void outputRecords(Tuple key, Tuple value, org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>> output)
           
abstract  Tuple parse(IK inkey, IV invalue)
          Parse the input key and input value into Tuple
protected  void updateCounter(java.lang.String group, java.lang.String couter, long number)
          update certain counter
 
Methods inherited from class com.ebay.erl.mobius.core.datajoin.DataJoinMapper
extractSortValueKeyword, getSortValueComparator, map
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tuple_criteria

protected TupleCriterion tuple_criteria
filters


key_columns

protected java.lang.String[] key_columns
columns to be emitted as key of this Mapper


value_columns

protected java.lang.String[] value_columns
columns to be emitted as value of this Mapper


projection_order

protected java.lang.String[] projection_order
Output column names for map only job, ex: listing.


currentDatasetID

protected java.lang.String currentDatasetID
The current dataset ID.


dataset_display_id

protected java.lang.String dataset_display_id
The normalized name of the dataset been processed by this mapper currently, it is used as counter ID to update the corresponding Hadoop counters for this dataset.

The name is normalized from the currentDatasetID by removing the serial number part.


counterThread

protected com.ebay.erl.mobius.core.mapred.CounterUpdateThread counterThread
A background thread responsible for updating the Hadoop counters.


_COUNTER_INPUT_RECORD

protected long _COUNTER_INPUT_RECORD
Counts for the number of input records.

#INPUT_RECORDS = #FILTERED_RECORDS + #OUTPUT_RECORDS.


_COUNTER_OUTPUT_RECORD

protected long _COUNTER_OUTPUT_RECORD
Counts for the number of outputted records.


_COUNTER_FILTERED_RECORD

protected long _COUNTER_FILTERED_RECORD
Counts for the number of filtered records, filtered by user specified tuple_criteria.


_COUNTER_INVALIDATE_FORMAT_RECORD

protected long _COUNTER_INVALIDATE_FORMAT_RECORD
Counts for invalidate format records.


computedColumns

protected java.util.List<ComputedColumns> computedColumns
ComputedColumns specified by user.


_IS_MAP_ONLY_JOB

protected boolean _IS_MAP_ONLY_JOB

_100MB

public static final long _100MB
See Also:
Constant Field Values

reporterSet

protected boolean reporterSet
Constructor Detail

AbstractMobiusMapper

public AbstractMobiusMapper()
Method Detail

configure

public void configure(org.apache.hadoop.mapred.JobConf conf)
Setup Mapper.

Override this method if there is extra initial settings need to be done.

Make sure to call super.configure(JobConf) when overriding.

Specified by:
configure in interface org.apache.hadoop.mapred.JobConfigurable
Overrides:
configure in class DataJoinMapper<IK,IV,org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>>

joinmap

public void joinmap(IK key,
                    IV value,
                    org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>> output,
                    org.apache.hadoop.mapred.Reporter reporter)
             throws java.io.IOException
map()

Specified by:
joinmap in class DataJoinMapper<IK,IV,org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>>
Throws:
java.io.IOException

outputRecords

protected void outputRecords(Tuple key,
                             Tuple value,
                             org.apache.hadoop.mapred.OutputCollector<org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>> output)
                      throws java.io.IOException
Throws:
java.io.IOException

close

public void close()
           throws java.io.IOException
close Mapper

Specified by:
close in interface java.io.Closeable
Overrides:
close in class org.apache.hadoop.mapred.MapReduceBase
Throws:
java.io.IOException

parse

public abstract Tuple parse(IK inkey,
                            IV invalue)
                     throws java.lang.IllegalArgumentException,
                            java.io.IOException
Parse the input key and input value into Tuple

Throws:
java.lang.IllegalArgumentException
java.io.IOException

updateCounter

protected final void updateCounter(java.lang.String group,
                                   java.lang.String couter,
                                   long number)
update certain counter


getDatasetID

public final java.lang.String getDatasetID()
Get the current dataset ID.

Specified by:
getDatasetID in class DataJoinMapper<IK,IV,org.apache.hadoop.io.WritableComparable<?>,org.apache.hadoop.io.WritableComparable<?>>

get

protected final java.lang.Object get(java.lang.String key)
                              throws java.io.IOException
Get object from JobConf, assuming the value is Base64 encoded, and can be decoded back to Java object.

If the value from JobConf for the given key is null or empty, null is returned.

Throws:
java.io.IOException