com.ebay.erl.mobius.core.builder
Class DatasetBuildersFactory

java.lang.Object
  extended by com.ebay.erl.mobius.core.builder.DatasetBuildersFactory

public class DatasetBuildersFactory
extends java.lang.Object

Gets the implementation of AbstractDatasetBuilder based on a given OutputFormat.

This class is used by the Mobius engine to build a dataset from an intermediate result based on its output format.

By default, Mobius uses TSVDatasetBuilder to build a dataset if the intermediate result of an analysis flow is in text format. Alternatively, Mobius uses SeqFileDatasetBuilder if the intermediate result is in sequence file format.

The intermediate result is created by the Mobius job. Users should not use this class to build their own dataset on HDFS.

This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan


Field Summary
protected  java.util.Map<java.lang.Class<? extends org.apache.hadoop.mapred.OutputFormat>,java.lang.Class<? extends AbstractDatasetBuilder>> _DATASET_BUILDERS
          mapping from a given OutputFormat to an implementation of AbstractDatasetBuilder.
 
Method Summary
 AbstractDatasetBuilder getBuilder(java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> prevJobOutFmt, java.lang.String datasetName)
          This method is used to generate a Dataset based on a result generated by previous Mobius job, so that the user can continue to refine the Dataset
static DatasetBuildersFactory getInstance(MobiusJob job)
          Get the singleton instance of DatasetBuildersFactory.
 DatasetBuildersFactory register(java.lang.Class<? extends org.apache.hadoop.mapred.OutputFormat> outputFormat, java.lang.Class<? extends AbstractDatasetBuilder> builder)
          Register a new implementation of AbstractDatasetBuilder which generates a Dataset that read the data generated by the OutputFormat.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_DATASET_BUILDERS

protected java.util.Map<java.lang.Class<? extends org.apache.hadoop.mapred.OutputFormat>,java.lang.Class<? extends AbstractDatasetBuilder>> _DATASET_BUILDERS
mapping from a given OutputFormat to an implementation of AbstractDatasetBuilder.

Method Detail

getInstance

public static DatasetBuildersFactory getInstance(MobiusJob job)
                                          throws java.io.IOException
Get the singleton instance of DatasetBuildersFactory.

Throws:
java.io.IOException

getBuilder

public AbstractDatasetBuilder getBuilder(java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> prevJobOutFmt,
                                         java.lang.String datasetName)
This method is used to generate a Dataset based on a result generated by previous Mobius job, so that the user can continue to refine the Dataset

Parameters:
prevJobOutFmt - the output format of previous job (an intermediate result in a flow).
datasetName - the name to be used for the new dataset.
Returns:
an implementation of AbstractDatasetBuilder for building a dataset from the intermediate result.

register

public DatasetBuildersFactory register(java.lang.Class<? extends org.apache.hadoop.mapred.OutputFormat> outputFormat,
                                       java.lang.Class<? extends AbstractDatasetBuilder> builder)
                                throws java.io.IOException
Register a new implementation of AbstractDatasetBuilder which generates a Dataset that read the data generated by the OutputFormat.

Parameters:
outputFormat - an output format type from previous job that the given builder will be used to create a dataset.
builder - an implementation of AbstractDatasetBuilder to build the dataset from an intermediate result (in the format of the given outputFormat).
Returns:
the DatasetBuildersFactory itself.
Throws:
java.io.IOException