com.ebay.erl.mobius.core.function.base
Class Projectable

java.lang.Object
  extended by com.ebay.erl.mobius.core.function.base.Projectable
All Implemented Interfaces:
java.io.Serializable, org.apache.hadoop.conf.Configurable
Direct Known Subclasses:
ExtendFunction, GroupFunction

public class Projectable
extends java.lang.Object
implements java.io.Serializable, org.apache.hadoop.conf.Configurable

Base class for all projection operations.

Users do not extend this class directly, and instead extend ExtendFunction, GroupFunction or their sub-classes.

A projectable takes one to many columns from one to many datasets as it inputs, then performs calculation to generate X number of rows as the output, where X can be zero to many. The schema of each outputted row is defined in the setOutputSchema(String...) method.

When providing a customized implementation of a projection operation, users must make sure the output schema is consistent with the actual output.

This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan

See Also:
GroupFunction, ExtendFunction, Serialized Form

Field Summary
protected  org.apache.hadoop.conf.Configuration conf
           
protected  int hashCode
           
protected  Column[] inputs
          the input columns that are required by this function to compute its result.
protected  java.lang.String[] outputSchema
          the name of the output columns in the #getResult() tuple
protected  org.apache.hadoop.mapred.Reporter reporter
           
protected  boolean requireDataFromMultiDatasets
           
 
Constructor Summary
protected Projectable()
          should be invoked by Column only
  Projectable(Column[] inputs)
          Create a Projectable which takes the inputs to compute some result.
 
Method Summary
 boolean calledByCombiner()
           
 boolean equals(java.lang.Object obj)
           
 org.apache.hadoop.conf.Configuration getConf()
           
 Column[] getInputColumns()
          Get the input columns.
 java.lang.String[] getOutputSchema()
          Get the output schema of the result of this function.
 java.util.Set<Dataset> getParticipatedDataset()
          return the Dataset that is required in computing the result of this function.
 int hashCode()
           
protected  void init(Column[] inputs)
           
 boolean isCombinable()
          Determine this function can be run in a combiner or not, default is false.
 boolean requireDataFromMultiDatasets()
          true if this function require columns from more than one dataset to compute its value.
 void setCalledByCombiner(boolean calledByCombiner)
           
 void setConf(org.apache.hadoop.conf.Configuration conf)
           
 Projectable setOutputSchema(java.lang.String... schema)
          Set the output schema of the result of this function.
 void setReporter(org.apache.hadoop.mapred.Reporter reporter)
           
 java.lang.String toString()
           
 boolean useGroupKeyOnly()
           
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

inputs

protected Column[] inputs
the input columns that are required by this function to compute its result.


outputSchema

protected java.lang.String[] outputSchema
the name of the output columns in the #getResult() tuple


conf

protected transient org.apache.hadoop.conf.Configuration conf

hashCode

protected int hashCode

requireDataFromMultiDatasets

protected boolean requireDataFromMultiDatasets

reporter

protected org.apache.hadoop.mapred.Reporter reporter
Constructor Detail

Projectable

public Projectable(Column[] inputs)
Create a Projectable which takes the inputs to compute some result. The schema of the result will be, by default, this.getClass().getSimpleName()+"_"+aColumn.getOutputName(), for each inputs.

The number of output column doesn't have to be the same as the number of input column, user can use setOutputSchema(String...) to set the real output schema.


Projectable

protected Projectable()
should be invoked by Column only

Method Detail

init

protected void init(Column[] inputs)

requireDataFromMultiDatasets

public final boolean requireDataFromMultiDatasets()
true if this function require columns from more than one dataset to compute its value.


hashCode

public int hashCode()
Overrides:
hashCode in class java.lang.Object

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

equals

public boolean equals(java.lang.Object obj)
Overrides:
equals in class java.lang.Object

setOutputSchema

public Projectable setOutputSchema(java.lang.String... schema)
Set the output schema of the result of this function.


getOutputSchema

public java.lang.String[] getOutputSchema()
Get the output schema of the result of this function.


getInputColumns

public Column[] getInputColumns()
Get the input columns.


getParticipatedDataset

public java.util.Set<Dataset> getParticipatedDataset()
return the Dataset that is required in computing the result of this function.

If only one Dataset is required, this function should be applied before the cross product phase.


getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

useGroupKeyOnly

public final boolean useGroupKeyOnly()

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

setReporter

public void setReporter(org.apache.hadoop.mapred.Reporter reporter)

isCombinable

public boolean isCombinable()
Determine this function can be run in a combiner or not, default is false.


calledByCombiner

public boolean calledByCombiner()

setCalledByCombiner

public void setCalledByCombiner(boolean calledByCombiner)