com.ebay.erl.mobius.core.builder
Class AbstractDatasetBuilder<ACUTAL_BUILDER_IMPL>

java.lang.Object
  extended by com.ebay.erl.mobius.core.builder.AbstractDatasetBuilder<ACUTAL_BUILDER_IMPL>
Type Parameters:
ACUTAL_BUILDER_IMPL - the implementation of a AbstractDatasetBuilder.
Direct Known Subclasses:
SeqFileDatasetBuilder, TSVDatasetBuilder

public abstract class AbstractDatasetBuilder<ACUTAL_BUILDER_IMPL>
extends java.lang.Object

The base class of all Dataset builders which builds instance of different Dataset.

This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan


Field Summary
protected  java.util.List<ComputedColumns> computedColumns
          The ComputedColumns for this dataset.
protected  java.lang.String datasetName
          name of this dataset.
protected  MobiusJob mobiusJob
          An instance of MobiusJob which contains the analysis flow.
 
Constructor Summary
protected AbstractDatasetBuilder(MobiusJob aJob, java.lang.String datasetName)
          Constructor for creating a dataset builder.
 
Method Summary
 ACUTAL_BUILDER_IMPL addComuptedColumn(ComputedColumns aComputedColumn)
          Add a ComputedColumns to this dataset.
protected  ACUTAL_BUILDER_IMPL addInputPath(boolean validatePathExistance, org.apache.hadoop.fs.Path... paths)
          Add the paths to the underline dataset.
 ACUTAL_BUILDER_IMPL addInputPath(org.apache.hadoop.fs.Path... paths)
          Specify the input path(s) of a Dataset.
 Dataset build()
          Finishing the Dataset building process.
 Dataset buildFromPreviousJob(org.apache.hadoop.mapred.JobConf prevJob, java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> prevJobOutputFormat, java.lang.String[] schema)
          To be called by Mobius engine, for building a dataset from a previous mobius job, user should not use this method.
protected  boolean checkTouchFile(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path aPath)
          Check if there is a touch file exist within the given aFolder.
 ACUTAL_BUILDER_IMPL constraint(TupleCriterion criteria)
          Put filter on the records of this Dataset, only raw within the Dataset that meet the criteria can be outputed.
protected  Dataset getDataset()
          Get the dataset, if it's null, then newDataset(String) will be called and assign dataset to the return object.
protected abstract  Dataset newDataset(java.lang.String datasetName)
          Create a new Dataset, the returned Dataset has no state at all (no paths, constraints...etc.)
 ACUTAL_BUILDER_IMPL setSchema(java.lang.String... schema)
          Specify the schema of this Dataset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mobiusJob

protected MobiusJob mobiusJob
An instance of MobiusJob which contains the analysis flow.


datasetName

protected java.lang.String datasetName
name of this dataset.


computedColumns

protected java.util.List<ComputedColumns> computedColumns
The ComputedColumns for this dataset.

Constructor Detail

AbstractDatasetBuilder

protected AbstractDatasetBuilder(MobiusJob aJob,
                                 java.lang.String datasetName)
Constructor for creating a dataset builder.

Method Detail

getDataset

protected Dataset getDataset()
Get the dataset, if it's null, then newDataset(String) will be called and assign dataset to the return object.


newDataset

protected abstract Dataset newDataset(java.lang.String datasetName)
Create a new Dataset, the returned Dataset has no state at all (no paths, constraints...etc.)


buildFromPreviousJob

public Dataset buildFromPreviousJob(org.apache.hadoop.mapred.JobConf prevJob,
                                    java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> prevJobOutputFormat,
                                    java.lang.String[] schema)
                             throws java.io.IOException
To be called by Mobius engine, for building a dataset from a previous mobius job, user should not use this method.

Throws:
java.io.IOException

setSchema

public ACUTAL_BUILDER_IMPL setSchema(java.lang.String... schema)
Specify the schema of this Dataset


addComuptedColumn

public ACUTAL_BUILDER_IMPL addComuptedColumn(ComputedColumns aComputedColumn)
Add a ComputedColumns to this dataset.

See Also:
ComputedColumns

build

public Dataset build()
              throws java.lang.IllegalStateException
Finishing the Dataset building process.

Invoke this method to get an reference to a Dataset so it can be used in MobiusJob.innerJoin(Dataset...), MobiusJob.list(Dataset, com.ebay.erl.mobius.core.model.Column...) ...etc.

Returns:
an instance of Dataset
Throws:
java.lang.IllegalStateException - when the user doesn't specify all the required parameters (no input path, for example) during the building process.

addInputPath

public ACUTAL_BUILDER_IMPL addInputPath(org.apache.hadoop.fs.Path... paths)
                                 throws java.io.IOException
Specify the input path(s) of a Dataset.

Parameters:
paths - one or more path that contain the dataset of
Returns:
the builder itself.
Throws:
java.io.IOException

checkTouchFile

protected boolean checkTouchFile(org.apache.hadoop.fs.FileSystem fs,
                                 org.apache.hadoop.fs.Path aPath)
Check if there is a touch file exist within the given aFolder.

This method is invoked when user use addInputPath(Path...), and return true by default, i.e., do not check touch file. Touch file is used in to indicate the files for a dataset are all ready, if the deployed Hadoop system will generate touch file for a Hadoop output folder, user should override this method to enable the touch file checking.

Parameters:
fs -
aFolder -
Returns:
true in default implementation.

addInputPath

protected ACUTAL_BUILDER_IMPL addInputPath(boolean validatePathExistance,
                                           org.apache.hadoop.fs.Path... paths)
                                    throws java.io.IOException
Add the paths to the underline dataset. A boolean flag validatePathExistance to specify if Mobius needs to verify the specified paths exist or not.

If validatePathExistance is true, and one of the paths doesn't exist, IOException will be thrown.

If a path exists and it's a folder, checkTouchFile(FileSystem, Path) will be called to see if a touch file exists under that folder or not. The default implementation of checkTouchFile always return true, which means the dataset builder doesn't check touch file by default. If this is a need to check touch file, the subclass should override that function, and when the funciton return false, IOException will be thrown here for that specific path.

Throws:
java.io.IOException

constraint

public ACUTAL_BUILDER_IMPL constraint(TupleCriterion criteria)
Put filter on the records of this Dataset, only raw within the Dataset that meet the criteria can be outputed.

Parameters:
criteria -
Returns:
the builder itself.