com.ebay.erl.mobius.core.builder
Class SeqFileDatasetBuilder

java.lang.Object
  extended by com.ebay.erl.mobius.core.builder.AbstractDatasetBuilder<SeqFileDatasetBuilder>
      extended by com.ebay.erl.mobius.core.builder.SeqFileDatasetBuilder

public class SeqFileDatasetBuilder
extends AbstractDatasetBuilder<SeqFileDatasetBuilder>

Reads a SequenceFile with NullWritable as its key and Tuple as its value.

The default Mapper is DefaultSeqFileMapper which only accepts NullWritable as the key type and Tuple as the value type from the underline sequence file.

If the sequence file includes different key and value types, specify a different implementation of SequenceFileMapper using setMapper(Class).

This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan


Field Summary
protected  java.lang.Class<? extends SequenceFileMapper> mapperClass
          Mapper class of this builder, by default, it's DefaultSeqFileMapper
 
Fields inherited from class com.ebay.erl.mobius.core.builder.AbstractDatasetBuilder
computedColumns, datasetName, mobiusJob
 
Constructor Summary
protected SeqFileDatasetBuilder(MobiusJob aJob, java.lang.String datasetName)
           
 
Method Summary
 Dataset buildFromPreviousJob(org.apache.hadoop.mapred.JobConf prevJob, java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> prevJobOutputFormat, java.lang.String[] schema)
          To be called by Mobius engine, for building a dataset from a previous mobius job, user should not use this method.
protected  Dataset newDataset(java.lang.String datasetName)
          Create a new Dataset, the returned Dataset has no state at all (no paths, constraints...etc.)
static SeqFileDatasetBuilder newInstance(MobiusJob job, java.lang.String name, java.lang.String[] schema)
          Get an new instance of SeqFileDatasetBuilder to build a dataset which is stored as Hadoop sequence file.
 SeqFileDatasetBuilder setMapper(java.lang.Class<? extends SequenceFileMapper> mapperClass)
          Set a new implementation of SequenceFileMapper to parse the underline sequence file records into tuples.
 
Methods inherited from class com.ebay.erl.mobius.core.builder.AbstractDatasetBuilder
addComuptedColumn, addInputPath, addInputPath, build, checkTouchFile, constraint, getDataset, setSchema
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mapperClass

protected java.lang.Class<? extends SequenceFileMapper> mapperClass
Mapper class of this builder, by default, it's DefaultSeqFileMapper

Constructor Detail

SeqFileDatasetBuilder

protected SeqFileDatasetBuilder(MobiusJob aJob,
                                java.lang.String datasetName)
Method Detail

newInstance

public static SeqFileDatasetBuilder newInstance(MobiusJob job,
                                                java.lang.String name,
                                                java.lang.String[] schema)
                                         throws java.io.IOException
Get an new instance of SeqFileDatasetBuilder to build a dataset which is stored as Hadoop sequence file.

By default, a SeqFileDatasetBuilder use DefaultSeqFileMapper to parse the underline sequence file records into Tuples, and the schema is set to every Tuple.

Please note that, the schema is not the names given to the key and value in the sequence file, but the names to the parsed results (Tuples).

Parameters:
job - a Mobius job contains the analysis flow.
name - the name of the dataset to be build.
schema - the schema of this dataset.
Throws:
java.io.IOException

setMapper

public SeqFileDatasetBuilder setMapper(java.lang.Class<? extends SequenceFileMapper> mapperClass)
Set a new implementation of SequenceFileMapper to parse the underline sequence file records into tuples.


newDataset

protected Dataset newDataset(java.lang.String datasetName)
Create a new Dataset, the returned Dataset has no state at all (no paths, constraints...etc.)

Specified by:
newDataset in class AbstractDatasetBuilder<SeqFileDatasetBuilder>

buildFromPreviousJob

public Dataset buildFromPreviousJob(org.apache.hadoop.mapred.JobConf prevJob,
                                    java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> prevJobOutputFormat,
                                    java.lang.String[] schema)
                             throws java.io.IOException
To be called by Mobius engine, for building a dataset from a previous mobius job, user should not use this method. If prevJobOutputFormat is SequenceFileOutputFormat, Mobius will use this class to build a dataset from the prevJob, which is an intermediate results in a Mobius job.

Overrides:
buildFromPreviousJob in class AbstractDatasetBuilder<SeqFileDatasetBuilder>
Throws:
java.io.IOException