|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.ebay.erl.mobius.core.builder.AbstractDatasetBuilder<TSVDatasetBuilder>
com.ebay.erl.mobius.core.builder.TSVDatasetBuilder
public class TSVDatasetBuilder
Represents text-based and line-oriented files on HDFS.
Each line is delimited by the delimiter, ( the default is tab), and the file is assigned the schema given in the constructor.
If the number of values in a line is less than the length of the schema, those columns are assigned a null value.
If the number of values in a line is greater than the length of the schema, those values are put into the tuple with the name IDX_$i, where $i starts from the length of the given schema.
This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan
Field Summary |
---|
Fields inherited from class com.ebay.erl.mobius.core.builder.AbstractDatasetBuilder |
---|
computedColumns, datasetName, mobiusJob |
Constructor Summary | |
---|---|
protected |
TSVDatasetBuilder(MobiusJob job,
java.lang.String datasetName)
|
Method Summary | |
---|---|
Dataset |
buildFromPreviousJob(org.apache.hadoop.mapred.JobConf prevJob,
java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> prevJobOutputFormat,
java.lang.String[] schema)
To be called by Mobius engine, for building a dataset from a previous mobius job, user should not use this method. |
protected Dataset |
newDataset(java.lang.String datasetName)
Create a new Dataset , the returned Dataset
has no state at all (no paths, constraints...etc.) |
static TSVDatasetBuilder |
newInstance(MobiusJob job,
java.lang.String name,
java.lang.String[] schema)
Create a new instance of TSVDatasetBuilder to build
a text based dataset. |
TSVDatasetBuilder |
setDelimiter(java.lang.String delimiter)
Specify the delimiter for the underline text file. |
TSVDatasetBuilder |
setMapper(java.lang.Class<? extends TSVMapper> mapper)
Change the default mapper implementation (default one is TSVMapper ), user should call this mapper when
the parsing logic in TSVMapper doesn't meet
the requirement. |
Methods inherited from class com.ebay.erl.mobius.core.builder.AbstractDatasetBuilder |
---|
addComuptedColumn, addInputPath, addInputPath, build, checkTouchFile, constraint, getDataset, setSchema |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
protected TSVDatasetBuilder(MobiusJob job, java.lang.String datasetName) throws java.io.IOException
java.io.IOException
Method Detail |
---|
public static TSVDatasetBuilder newInstance(MobiusJob job, java.lang.String name, java.lang.String[] schema) throws java.io.IOException
TSVDatasetBuilder
to build
a text based dataset.
By default, the underline text file will be read and delimited
by tab in TSVMapper
, then be converted into Tuple
and set the given schema
. See TSVMapper
for
more detail.
job
- a Mobius job contains an analysis flow.name
- the name of the dataset to be built.schema
- the schema of the underline dataset.
java.io.IOException
public TSVDatasetBuilder setDelimiter(java.lang.String delimiter) throws java.io.IOException
The default delimiter is tab.
delimiter
-
java.io.IOException
protected Dataset newDataset(java.lang.String datasetName)
AbstractDatasetBuilder
Dataset
, the returned Dataset
has no state at all (no paths, constraints...etc.)
newDataset
in class AbstractDatasetBuilder<TSVDatasetBuilder>
public TSVDatasetBuilder setMapper(java.lang.Class<? extends TSVMapper> mapper)
TSVMapper
), user should call this mapper when
the parsing logic in TSVMapper
doesn't meet
the requirement.
mapper
-
public Dataset buildFromPreviousJob(org.apache.hadoop.mapred.JobConf prevJob, java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> prevJobOutputFormat, java.lang.String[] schema) throws java.io.IOException
buildFromPreviousJob
in class AbstractDatasetBuilder<TSVDatasetBuilder>
java.io.IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |