|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.ebay.erl.mobius.core.builder.Dataset
public class Dataset
Represents a type of data on the Hadoop cluster.
A dataset contains the following information:
InputFormat
specifies the format for the
Hadoop to use to read the data.Path
indicates the
data location of this dataset.AbstractMobiusMapper
provides
Hadoop the information on mapper class.
An instance of Dataset
is built by an implementation of
AbstractDatasetBuilder
.
This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan
Field Summary | |
---|---|
protected java.util.ArrayList<ComputedColumns> |
computedColumns
To store user defined ComputedColumns
of this dataset, if any. |
protected org.apache.hadoop.conf.Configuration |
conf
Hadoop configuration. |
protected java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat> |
input_format
The InputFormat of this dataset, so Hadoop knows how to
read this mapper. |
protected MobiusJob |
job
The mobius job contains the ananlysis flow. |
protected java.lang.Class<? extends AbstractMobiusMapper> |
mapper
The corresponding AbstractMobiusMapper implementation which
parse the records of this dataset input Tuple . |
protected java.lang.String |
name
name of this dataset. |
protected java.util.LinkedHashSet<java.lang.String> |
schema
The schema of this Dataset , using
LinkedHashSet to preserve the
schema order. |
protected TupleCriterion |
tupleConstraint
the tuple constraint. |
Constructor Summary | |
---|---|
protected |
Dataset(MobiusJob job,
java.lang.String name)
|
Method Summary | |
---|---|
protected void |
addComputedColumn(ComputedColumns aComputedColumn)
Add a ComputedColumns to this dataset. |
org.apache.hadoop.mapred.JobConf |
createJobConf(int jobSequenceNumber)
Create a Hadoop JobConf that represents this dataset. |
boolean |
equals(java.lang.Object obj)
Return true only if the obj
is an instance of Dataset,
the name, input format, mapper, and the
schema of this and the
obj are both equals. |
java.lang.String |
getDatasetID(int jobSequenceNumber)
Get the ID for this dataset. |
java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat> |
getInputFormat()
Get the InputFormat of this dataset. |
java.util.List<org.apache.hadoop.fs.Path> |
getInputs()
Get the input paths of this dataset. |
java.lang.Class<? extends AbstractMobiusMapper> |
getMapper()
Get the AbstractMobiusMapper of this dataset. |
java.lang.String |
getName()
Get the name of this dataset. |
protected java.util.LinkedHashSet<java.lang.String> |
getSchema()
Get the schema of this Dataset . |
int |
hashCode()
|
protected void |
initialize()
The initializer, this is called everytime when a new Dataset instance is created by a
AbstractDatasetBuilder |
Dataset |
orderBy(java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> outputformat,
Sorter... sorters)
Sort this Dataset by the given sorters . |
Dataset |
orderBy(org.apache.hadoop.fs.Path output,
java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> outputformat,
Sorter... sorters)
Sort this Dataset by the given sorters . |
Dataset |
orderBy(org.apache.hadoop.fs.Path output,
Sorter... sorters)
Sort this Dataset by the given sorters . |
Dataset |
orderBy(Sorter... sorters)
Sort this Dataset by the given sorters . |
protected void |
setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat> input_format)
Specified the InputFormat of this dataset. |
protected void |
setMapper(java.lang.Class<? extends AbstractMobiusMapper> mapper)
Set the AbstractMobiusMapper for this dataset. |
protected void |
setSchema(java.lang.String... schema)
Specified the schema of this dataset. |
java.lang.String |
toString()
return a string contain the name of this dataset and its schema. |
protected void |
validate()
validate if this dataset has all the required parameter |
boolean |
withinSchema(java.lang.String aColumn)
Check for a given aColumn , if it is defined in this dataset or not. |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Field Detail |
---|
protected java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat> input_format
InputFormat
of this dataset, so Hadoop knows how to
read this mapper.
protected java.lang.Class<? extends AbstractMobiusMapper> mapper
AbstractMobiusMapper
implementation which
parse the records of this dataset input Tuple
.
protected TupleCriterion tupleConstraint
If specified, only tuples that pass this constraint will be emitted.
protected transient org.apache.hadoop.conf.Configuration conf
protected transient java.util.ArrayList<ComputedColumns> computedColumns
ComputedColumns
of this dataset, if any.
protected java.util.LinkedHashSet<java.lang.String> schema
Dataset
, using
LinkedHashSet
to preserve the
schema order.
protected java.lang.String name
protected transient MobiusJob job
Constructor Detail |
---|
protected Dataset(MobiusJob job, java.lang.String name)
Method Detail |
---|
protected java.util.LinkedHashSet<java.lang.String> getSchema()
Dataset
.
The returned set is a LinkedHashSet
,
schema is sorted in the insertion order.
public org.apache.hadoop.mapred.JobConf createJobConf(int jobSequenceNumber) throws java.io.IOException
This method is called by Mobius.
java.io.IOException
public java.lang.String getDatasetID(int jobSequenceNumber)
A dataset id is composed of two digits
of integer (from the jobSequenceNumber
)
and the name of the dataset.
This method is used by Mobius engine only.
protected void setSchema(java.lang.String... schema)
protected void addComputedColumn(ComputedColumns aComputedColumn)
ComputedColumns
to this dataset.
This method is called by an implementation of
AbstractDatasetBuilder
.
protected void initialize()
Dataset
instance is created by a
AbstractDatasetBuilder
protected void setInputFormat(java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat> input_format)
InputFormat
of this dataset.
This method is called by the corresponding implementation
of AbstractDatasetBuilder
.
protected void setMapper(java.lang.Class<? extends AbstractMobiusMapper> mapper)
AbstractMobiusMapper
for this dataset.
This method is called by the corresponding implementation
of AbstractDatasetBuilder
.
public java.util.List<org.apache.hadoop.fs.Path> getInputs()
Paths are specified by the user during the dataset building process.
public java.lang.Class<? extends AbstractMobiusMapper> getMapper()
AbstractMobiusMapper
of this dataset.
public java.lang.Class<? extends org.apache.hadoop.mapred.InputFormat> getInputFormat()
InputFormat
of this dataset.
public boolean withinSchema(java.lang.String aColumn)
aColumn
, if it is defined in this dataset or not.
aColumn
- the name fo a column.
aColumn
is defined in this dataset
(case insensitive), false other wise.protected void validate()
public java.lang.String getName()
The name of a dataset is specified during the dataset building process.
public java.lang.String toString()
toString
in class java.lang.Object
public boolean equals(java.lang.Object obj)
obj
is an instance of Dataset,
the name, input format, mapper, and the
schema of this
and the
obj
are both equals.
Otherwise, false.
equals
in class java.lang.Object
public Dataset orderBy(Sorter... sorters) throws java.io.IOException
sorters
.
SequenceFileOutputFormat
(binary output)
java.io.IOException
public Dataset orderBy(java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> outputformat, Sorter... sorters) throws java.io.IOException
sorters
.
outputformat
java.io.IOException
public Dataset orderBy(org.apache.hadoop.fs.Path output, Sorter... sorters) throws java.io.IOException
sorters
.
TextOutputFormat
(text output)output
java.io.IOException
public Dataset orderBy(org.apache.hadoop.fs.Path output, java.lang.Class<? extends org.apache.hadoop.mapred.FileOutputFormat> outputformat, Sorter... sorters) throws java.io.IOException
sorters
.
outputformat
output
java.io.IOException
public int hashCode()
hashCode
in class java.lang.Object
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |