Example usage for org.apache.hadoop.mapreduce TaskInputOutputContext getOutputCommitter

List of usage examples for org.apache.hadoop.mapreduce TaskInputOutputContext getOutputCommitter

Introduction

In this page you can find the example usage for org.apache.hadoop.mapreduce TaskInputOutputContext getOutputCommitter.

Prototype

public OutputCommitter getOutputCommitter();

Source Link

Document

Get the OutputCommitter for the task-attempt.

Usage

From source file:com.bonc.mr_roamRecognition_hjpt.comm.NewFileOutputFormat.java

License:Apache License

/**
 * Get the {@link Path} to the task's temporary output directory for the
 * map-reduce job// w w w .  j a  va  2 s .  c o  m
 * 
 * <h4 id="SideEffectFiles">Tasks' Side-Effect Files</h4>
 * 
 * <p>
 * Some applications need to create/write-to side-files, which differ from
 * the actual job-outputs.
 * 
 * <p>
 * In such cases there could be issues with 2 instances of the same TIP
 * (running simultaneously e.g. speculative tasks) trying to open/write-to
 * the same file (path) on HDFS. Hence the application-writer will have to
 * pick unique names per task-attempt (e.g. using the attemptid, say
 * <tt>attempt_200709221812_0001_m_000000_0</tt>), not just per TIP.
 * </p>
 * 
 * <p>
 * To get around this the Map-Reduce framework helps the application-writer
 * out by maintaining a special
 * <tt>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</tt>
 * sub-directory for each task-attempt on HDFS where the output of the
 * task-attempt goes. On successful completion of the task-attempt the files
 * in the
 * <tt>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</tt>
 * (only) are <i>promoted</i> to
 * <tt>${mapreduce.output.fileoutputformat.outputdir}</tt>. Of course, the
 * framework discards the sub-directory of unsuccessful task-attempts. This
 * is completely transparent to the application.
 * </p>
 * 
 * <p>
 * The application-writer can take advantage of this by creating any
 * side-files required in a work directory during execution of his task i.e.
 * via {@link #getWorkOutputPath(TaskInputOutputContext)}, and the framework
 * will move them out similarly - thus she doesn't have to pick unique paths
 * per task-attempt.
 * </p>
 * 
 * <p>
 * The entire discussion holds true for maps of jobs with reducer=NONE (i.e.
 * 0 reduces) since output of the map, in that case, goes directly to HDFS.
 * </p>
 * 
 * @return the {@link Path} to the task's temporary output directory for the
 *         map-reduce job.
 */
public static Path getWorkOutputPath(TaskInputOutputContext<?, ?, ?, ?> context)
        throws IOException, InterruptedException {
    FileOutputCommitter committer = (FileOutputCommitter) context.getOutputCommitter();
    return committer.getWorkPath();
}

From source file:edu.arizona.cs.hadoop.fs.irods.output.HirodsFileOutputFormat.java

License:Apache License

/**
 * Get the {@link Path} to the task's temporary output directory for the
 * map-reduce job/*from   w w w  . ja  va 2 s . c o m*/
 *
 * <h4 id="SideEffectFiles">Tasks' Side-Effect Files</h4>
 *
 * <p>
 * Some applications need to create/write-to side-files, which differ from
 * the actual job-outputs.
 *
 * <p>
 * In such cases there could be issues with 2 instances of the same TIP
 * (running simultaneously e.g. speculative tasks) trying to open/write-to
 * the same file (path) on HDFS. Hence the application-writer will have to
 * pick unique names per task-attempt (e.g. using the attemptid, say
 * <tt>attempt_200709221812_0001_m_000000_0</tt>), not just per TIP.</p>
 *
 * <p>
 * To get around this the Map-Reduce framework helps the application-writer
 * out by maintaining a special
 * <tt>${mapred.output.dir}/_temporary/_${taskid}</tt>
 * sub-directory for each task-attempt on HDFS where the output of the
 * task-attempt goes. On successful completion of the task-attempt the files
 * in the <tt>${mapred.output.dir}/_temporary/_${taskid}</tt> (only) are
 * <i>promoted</i> to <tt>${mapred.output.dir}</tt>. Of course, the
 * framework discards the sub-directory of unsuccessful task-attempts. This
 * is completely transparent to the application.</p>
 *
 * <p>
 * The application-writer can take advantage of this by creating any
 * side-files required in a work directory during execution of his task i.e.
 * via {@link #getWorkOutputPath(TaskInputOutputContext)}, and the framework
 * will move them out similarly - thus she doesn't have to pick unique paths
 * per task-attempt.</p>
 *
 * <p>
 * The entire discussion holds true for maps of jobs with reducer=NONE (i.e.
 * 0 reduces) since output of the map, in that case, goes directly to
 * HDFS.</p>
 *
 * @return the {@link Path} to the task's temporary output directory for the
 * map-reduce job.
 */
public static Path getWorkOutputPath(TaskInputOutputContext<?, ?, ?, ?> context)
        throws IOException, InterruptedException {
    HirodsFileOutputCommitter committer = (HirodsFileOutputCommitter) context.getOutputCommitter();
    return committer.getWorkPath();
}