List of usage examples for org.apache.hadoop.conf Configured subclass-usage
From source file com.github.gaoyangthu.demo.mapred.MultiFileWordCount.java
/** * MultiFileWordCount is an example to demonstrate the usage of * MultiFileInputFormat. This examples counts the occurrences of * words in the text files under the given input directory. */ public class MultiFileWordCount extends Configured implements Tool {
From source file com.github.gaoyangthu.demo.mapred.PiEstimator.java
/**
* A Map-reduce program to estimate the value of Pi
* using quasi-Monte Carlo method.
*
* Mapper:
* Generate points in a unit square
From source file com.github.gaoyangthu.demo.mapred.RandomTextWriter.java
/**
* This program uses map/reduce to just run a distributed job where there is
* no interaction between the tasks and each task writes a large unsorted
* random sequence of words.
* In order for this program to generate data for terasort with a 5-10 words
* per key and 20-100 words per value, have the following config:
From source file com.github.gaoyangthu.demo.mapred.RandomWriter.java
/**
* This program uses map/reduce to just run a distributed job where there is
* no interaction between the tasks and each task write a large unsorted
* random binary sequence file of BytesWritable.
* In order for this program to generate data for terasort with 10-byte keys
* and 90-byte values, have the following config:
From source file com.github.gaoyangthu.demo.mapred.SleepJob.java
/**
* Dummy class for testing MR framefork. Sleeps for a defined period
* of time in mapper and reducer. Generates fake input for map / reduce
* jobs. Note that generated number of input pairs is in the order
* of <code>numMappers * mapSleepTime / 100</code>, so the job uses
* some disk space.
From source file com.github.gaoyangthu.demo.mapred.Sort.java
/**
* This is the trivial map/reduce program that does absolutely nothing
* other than use the framework to fragment and sort the input values.
*
* To run: bin/hadoop jar build/hadoop-examples.jar sort
* [-m <i>maps</i>] [-r <i>reduces</i>]
From source file com.github.gaoyangthu.demo.mapred.terasort.TeraGen.java
/**
* Generate the official terasort input data set.
* The user specifies the number of rows and the output directory and this
* class runs a map/reduce program to generate the data.
* The format of the data is:
* <ul>
From source file com.github.gaoyangthu.demo.mapred.terasort.TeraSort.java
/**
* Generates the sampled split points, launches the job, and waits for it to
* finish.
* <p>
* To run the program:
* <b>bin/hadoop jar hadoop-examples-*.jar terasort in-dir out-dir</b>
From source file com.github.gaoyangthu.demo.mapred.terasort.TeraValidate.java
/**
* Generate 1 mapper per a file that checks to make sure the keys
* are sorted within each file. The mapper also generates
* "$file:begin", first key and "$file:end", last key. The reduce verifies that
* all of the start/end items are in order.
* Any output from the reduce is problem report.
From source file com.github.karahiyo.hadoop.mapreduce.examples.dancing.DistributedPentomino.java
/**
* Launch a distributed pentomino solver.
* It generates a complete list of prefixes of length N with each unique prefix
* as a separate line. A prefix is a sequence of N integers that denote the
* index of the row that is choosen for each column in order. Note that the
* next column is heuristically choosen by the solver, so it is dependant on