org.apache.hadoop.mapred.lib
Class InputSampler<K,V>

java.lang.Object
  extended by org.apache.hadoop.mapred.lib.InputSampler<K,V>
All Implemented Interfaces:
org.apache.hadoop.conf.Configurable, org.apache.hadoop.util.Tool

public class InputSampler<K,V>
extends java.lang.Object
implements org.apache.hadoop.util.Tool

Utility for collecting samples and writing a partition file for TotalOrderPartitioner.

This class is copied from Hadoop 0.20.x release to make Mobius compatible in both Hadoop 0.20.X and 0.22. (see https://issues.apache.org/jira/browse/MAPREDUCE-4251)

This product is licensed under the Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0. This product contains portions derived from Apache hadoop which is licensed under the Apache License, Version 2.0, available at http://hadoop.apache.org. © 2007 – 2012 eBay Inc., Evan Chiu, Woody Zhou, Neel Sundaresan


Nested Class Summary
static class InputSampler.IntervalSampler<K,V>
          Sample from s splits at regular intervals.
static class InputSampler.RandomSampler<K,V>
          Sample from random points in the input.
static interface InputSampler.Sampler<K,V>
          Interface to sample using an InputFormat.
static class InputSampler.SplitSampler<K,V>
          Samples the first n records from s splits.
 
Constructor Summary
InputSampler(org.apache.hadoop.mapred.JobConf conf)
           
 
Method Summary
 org.apache.hadoop.conf.Configuration getConf()
           
 int run(java.lang.String[] args)
          Driver for InputSampler from the command line.
 void setConf(org.apache.hadoop.conf.Configuration conf)
           
static
<K,V> void
writePartitionFile(org.apache.hadoop.mapred.JobConf job, InputSampler.Sampler<K,V> sampler)
          Write a partition file for the given job, using the Sampler provided.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

InputSampler

public InputSampler(org.apache.hadoop.mapred.JobConf conf)
Method Detail

getConf

public org.apache.hadoop.conf.Configuration getConf()
Specified by:
getConf in interface org.apache.hadoop.conf.Configurable

setConf

public void setConf(org.apache.hadoop.conf.Configuration conf)
Specified by:
setConf in interface org.apache.hadoop.conf.Configurable

writePartitionFile

public static <K,V> void writePartitionFile(org.apache.hadoop.mapred.JobConf job,
                                            InputSampler.Sampler<K,V> sampler)
                               throws java.io.IOException
Write a partition file for the given job, using the Sampler provided. Queries the sampler for a sample keyset, sorts by the output key comparator, selects the keys for each rank, and writes to the destination returned from TotalOrderPartitioner.getPartitionFile(org.apache.hadoop.mapred.JobConf).

Throws:
java.io.IOException

run

public int run(java.lang.String[] args)
        throws java.lang.Exception
Driver for InputSampler from the command line. Configures a JobConf instance and calls writePartitionFile(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.lib.InputSampler.Sampler).

Specified by:
run in interface org.apache.hadoop.util.Tool
Throws:
java.lang.Exception