com.ebay.erl.mobius.core.model
Class ComputedColumns

java.lang.Object
  extended by com.ebay.erl.mobius.core.model.ComputedColumns
All Implemented Interfaces:
java.io.Serializable

public abstract class ComputedColumns
extends java.lang.Object
implements java.io.Serializable

In a dataset building stage, this base class dynamically computes new columns for each row in the selected dataset.

For example, if a dataset contains the two columns: UNIT_PRICE and QUANTITY, users can add ComputedColumns to compute TOTAL_PRICE as follows:

 
 new ComputedColumns("TOTAL_PRICE")
 {
        public void consume(Tuple newRow)
        {
                float price = newRow.getFloat("UNIT_PRICE");
                int quantity = newRow.getInt("QUANTITY");
                
                Tuple result = new Tuple();
                result.put("TOTAL_PRICE", price*quantity);
 
                output(result);
        }
 }
 
 

The consume(Tuple) method produces more than one row for each newRow if users invoke the output(Tuple) multiple times.

This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan

See Also:
Serialized Form

Field Summary
protected  java.lang.String[] outputSchema
          the schema of the output Tuple
protected  org.apache.hadoop.mapred.Reporter reporter
           
protected  BigTupleList result
           
 
Constructor Summary
ComputedColumns(java.lang.String... outputSchema)
          Create an instance of ComputedColumns which will add new column(s) to each row in a dataset.
 
Method Summary
abstract  void consume(Tuple newRow)
          Calculate the computed result based on the input row.
 java.lang.String[] getOutputSchema()
          Get the schema of the output of this ComputedColumns.
 BigTupleList getResult()
          To be called by Mobius engine, the returned list contains zero to many Tuple which is computed and emitted in consume(Tuple), each tuple has the same schema as the one specified in the constructor.
protected  void output(Tuple t)
          When user finished the computed result(s) in the consume(Tuple), use this method to output the result.
 void reset()
          To be called by Mobius engine when a new raw in a dataset come, this method will be called for every new raw for resetting previous result.
 void setReporter(org.apache.hadoop.mapred.Reporter reporter)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

result

protected BigTupleList result

reporter

protected transient org.apache.hadoop.mapred.Reporter reporter

outputSchema

protected java.lang.String[] outputSchema
the schema of the output Tuple

Constructor Detail

ComputedColumns

public ComputedColumns(java.lang.String... outputSchema)
Create an instance of ComputedColumns which will add new column(s) to each row in a dataset.

The schema of the new column(s) are given by outputSchema

Method Detail

consume

public abstract void consume(Tuple newRow)
Calculate the computed result based on the input row. The schema of computed result must be same as the outputSchema.

When a computed result is generated, user can then use output(Tuple) to emit the result. Usually the output(Tuple) is called once per row, but user has the freedom to call output(Tuple) multiple times if the logic needs to produce multiple output records per input records.

Example 1: one output record per input

          
 public void consume(Tuple newRow)
 {
   float usd = newRow.getFloat("USD");
   float rate = newRow.getFloat("EXCHANGE_RATE");
   
   Tuple result = new Tuple();
   result.put("TARGET_CURRENCY", usd*rate);
   output(result);
 }
 
 
Example 2: multiple output records per input.
 
 // break down title into tokens, then later we can group by "TOKEN"
 // to calculate the frequency
 public void consume(Tuple newRow)
 {
   String title = newRow.getString("title");
   String[] tokens = title.toLowerCase().split("\\p{Space}+");
   for ( String aToken:tokens )
   {
        Tuple t = new Tuple();
        t.put("TOKEN", aToken);
        output(t);
   }
 }
 
 


output

protected final void output(Tuple t)
When user finished the computed result(s) in the consume(Tuple), use this method to output the result.


getOutputSchema

public final java.lang.String[] getOutputSchema()
Get the schema of the output of this ComputedColumns.


getResult

public final BigTupleList getResult()
To be called by Mobius engine, the returned list contains zero to many Tuple which is computed and emitted in consume(Tuple), each tuple has the same schema as the one specified in the constructor.


reset

public final void reset()
To be called by Mobius engine when a new raw in a dataset come, this method will be called for every new raw for resetting previous result.


setReporter

public void setReporter(org.apache.hadoop.mapred.Reporter reporter)