|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.ebay.erl.mobius.core.model.ComputedColumns
public abstract class ComputedColumns
In a dataset building stage, this base class dynamically computes new columns for each row in the selected dataset.
For example, if a dataset contains the two columns: UNIT_PRICE
and QUANTITY
, users can add ComputedColumns
to compute TOTAL_PRICE
as follows:
new ComputedColumns("TOTAL_PRICE")
{
public void consume(Tuple newRow)
{
float price = newRow.getFloat("UNIT_PRICE");
int quantity = newRow.getInt("QUANTITY");
Tuple result = new Tuple();
result.put("TOTAL_PRICE", price*quantity);
output(result);
}
}
The consume(Tuple)
method produces more than one row
for each newRow
if users invoke the output(Tuple)
multiple times.
This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan
Field Summary | |
---|---|
protected java.lang.String[] |
outputSchema
the schema of the output Tuple |
protected org.apache.hadoop.mapred.Reporter |
reporter
|
protected BigTupleList |
result
|
Constructor Summary | |
---|---|
ComputedColumns(java.lang.String... outputSchema)
Create an instance of ComputedColumns which
will add new column(s) to each row in a dataset. |
Method Summary | |
---|---|
abstract void |
consume(Tuple newRow)
Calculate the computed result based on the input row. |
java.lang.String[] |
getOutputSchema()
Get the schema of the output of this ComputedColumns . |
BigTupleList |
getResult()
To be called by Mobius engine, the returned list contains zero to many Tuple which is computed and emitted
in consume(Tuple) , each tuple has the same schema
as the one specified in the constructor. |
protected void |
output(Tuple t)
When user finished the computed result(s) in the consume(Tuple) ,
use this method to output the result. |
void |
reset()
To be called by Mobius engine when a new raw in a dataset come, this method will be called for every new raw for resetting previous result. |
void |
setReporter(org.apache.hadoop.mapred.Reporter reporter)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected BigTupleList result
protected transient org.apache.hadoop.mapred.Reporter reporter
protected java.lang.String[] outputSchema
Tuple
Constructor Detail |
---|
public ComputedColumns(java.lang.String... outputSchema)
ComputedColumns
which
will add new column(s) to each row in a dataset.
The schema of the new column(s) are given by
outputSchema
Method Detail |
---|
public abstract void consume(Tuple newRow)
outputSchema
.
When a computed result is generated, user can then
use output(Tuple)
to emit the result. Usually
the output(Tuple)
is called once per row, but
user has the freedom to call output(Tuple)
multiple
times if the logic needs to produce multiple output records
per input records.
Example 1: one output record per input
public void consume(Tuple newRow)
{
float usd = newRow.getFloat("USD");
float rate = newRow.getFloat("EXCHANGE_RATE");
Tuple result = new Tuple();
result.put("TARGET_CURRENCY", usd*rate);
output(result);
}
Example 2: multiple output records per input.
// break down title into tokens, then later we can group by "TOKEN"
// to calculate the frequency
public void consume(Tuple newRow)
{
String title = newRow.getString("title");
String[] tokens = title.toLowerCase().split("\\p{Space}+");
for ( String aToken:tokens )
{
Tuple t = new Tuple();
t.put("TOKEN", aToken);
output(t);
}
}
protected final void output(Tuple t)
consume(Tuple)
,
use this method to output the result.
public final java.lang.String[] getOutputSchema()
ComputedColumns
.
public final BigTupleList getResult()
Tuple
which is computed and emitted
in consume(Tuple)
, each tuple has the same schema
as the one specified in the constructor.
public final void reset()
public void setReporter(org.apache.hadoop.mapred.Reporter reporter)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |