List of usage examples for org.apache.hadoop.mapreduce TaskInputOutputContext getOutputCommitter
public OutputCommitter getOutputCommitter();
From source file:com.bonc.mr_roamRecognition_hjpt.comm.NewFileOutputFormat.java
License:Apache License
/** * Get the {@link Path} to the task's temporary output directory for the * map-reduce job// w w w . j a va 2 s . c o m * * <h4 id="SideEffectFiles">Tasks' Side-Effect Files</h4> * * <p> * Some applications need to create/write-to side-files, which differ from * the actual job-outputs. * * <p> * In such cases there could be issues with 2 instances of the same TIP * (running simultaneously e.g. speculative tasks) trying to open/write-to * the same file (path) on HDFS. Hence the application-writer will have to * pick unique names per task-attempt (e.g. using the attemptid, say * <tt>attempt_200709221812_0001_m_000000_0</tt>), not just per TIP. * </p> * * <p> * To get around this the Map-Reduce framework helps the application-writer * out by maintaining a special * <tt>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</tt> * sub-directory for each task-attempt on HDFS where the output of the * task-attempt goes. On successful completion of the task-attempt the files * in the * <tt>${mapreduce.output.fileoutputformat.outputdir}/_temporary/_${taskid}</tt> * (only) are <i>promoted</i> to * <tt>${mapreduce.output.fileoutputformat.outputdir}</tt>. Of course, the * framework discards the sub-directory of unsuccessful task-attempts. This * is completely transparent to the application. * </p> * * <p> * The application-writer can take advantage of this by creating any * side-files required in a work directory during execution of his task i.e. * via {@link #getWorkOutputPath(TaskInputOutputContext)}, and the framework * will move them out similarly - thus she doesn't have to pick unique paths * per task-attempt. * </p> * * <p> * The entire discussion holds true for maps of jobs with reducer=NONE (i.e. * 0 reduces) since output of the map, in that case, goes directly to HDFS. * </p> * * @return the {@link Path} to the task's temporary output directory for the * map-reduce job. */ public static Path getWorkOutputPath(TaskInputOutputContext<?, ?, ?, ?> context) throws IOException, InterruptedException { FileOutputCommitter committer = (FileOutputCommitter) context.getOutputCommitter(); return committer.getWorkPath(); }
From source file:edu.arizona.cs.hadoop.fs.irods.output.HirodsFileOutputFormat.java
License:Apache License
/** * Get the {@link Path} to the task's temporary output directory for the * map-reduce job/*from w w w . ja va 2 s . c o m*/ * * <h4 id="SideEffectFiles">Tasks' Side-Effect Files</h4> * * <p> * Some applications need to create/write-to side-files, which differ from * the actual job-outputs. * * <p> * In such cases there could be issues with 2 instances of the same TIP * (running simultaneously e.g. speculative tasks) trying to open/write-to * the same file (path) on HDFS. Hence the application-writer will have to * pick unique names per task-attempt (e.g. using the attemptid, say * <tt>attempt_200709221812_0001_m_000000_0</tt>), not just per TIP.</p> * * <p> * To get around this the Map-Reduce framework helps the application-writer * out by maintaining a special * <tt>${mapred.output.dir}/_temporary/_${taskid}</tt> * sub-directory for each task-attempt on HDFS where the output of the * task-attempt goes. On successful completion of the task-attempt the files * in the <tt>${mapred.output.dir}/_temporary/_${taskid}</tt> (only) are * <i>promoted</i> to <tt>${mapred.output.dir}</tt>. Of course, the * framework discards the sub-directory of unsuccessful task-attempts. This * is completely transparent to the application.</p> * * <p> * The application-writer can take advantage of this by creating any * side-files required in a work directory during execution of his task i.e. * via {@link #getWorkOutputPath(TaskInputOutputContext)}, and the framework * will move them out similarly - thus she doesn't have to pick unique paths * per task-attempt.</p> * * <p> * The entire discussion holds true for maps of jobs with reducer=NONE (i.e. * 0 reduces) since output of the map, in that case, goes directly to * HDFS.</p> * * @return the {@link Path} to the task's temporary output directory for the * map-reduce job. */ public static Path getWorkOutputPath(TaskInputOutputContext<?, ?, ?, ?> context) throws IOException, InterruptedException { HirodsFileOutputCommitter committer = (HirodsFileOutputCommitter) context.getOutputCommitter(); return committer.getWorkPath(); }