You are running a program that takes a long time to execute, and you need to present the user with an estimated time until completion.
Use Commons Math's SimpleRegression
and Commons Lang's StopWatch
to create a ProcessEstimator
class that can be used to predict when a particular
program will be finished. Your program needs to process a number of
records, and this program could take a few hours to finish. You would
like to provide some feedback, and, if you are confident that each
record will take roughly the same amount of time, you can use SimpleRegression
's slope and intercept to
estimate the time when all records will be processed. Example 8-1 defines the ProcessEstimator
class that combines the power
of StopWatch
and ProcessEstimator
to estimate the time
remaining in a process.
Example 8-1. ProcessEstimator to estimate time of program execution
package com.discursive.jccook.math.timeestimate; import org.apache.commons.lang.time.StopWatch; import org.apache.commons.math.stat.multivariate.SimpleRegression; public class ProcessEstimator { private SimpleRegression regression = new SimpleRegression( ); private StopWatch stopWatch = new StopWatch( ); // Total number of units private int units = 0; // Number of units completed private int completed = 0; // Sample rate for regression private int sampleRate = 1; public ProcessEstimator( int numUnits, int sampleRate ) { this.units = numUnits; this.sampleRate = sampleRate; } public void start( ) { stopWatch.start( ); } public void stop( ) { stopWatch.stop( ); } public void unitCompleted( ) { completed++; if( completed % sampleRate == 0 ) { long now = System.currentTimeMillis( ); regression.addData( units - completed, stopWatch.getTime( )); } } public long projectedFinish( ) { return (long) regression.getIntercept( ); } public long getTimeSpent( ) { return stopWatch.getTime( ); } public long projectedTimeRemaining( ) { long timeRemaining = projectedFinish( ) - getTimeSpent( ); return timeRemaining; } public int getUnits( ) { return units; } public int getCompleted( ) { return completed; } }
ProcessEstimator
has a
constructor that takes the number of records to process and the sample
rate to measure progress. With 10,000 records to process and a sample of
100, the SimpleRegression
will add a
data point of units remaining versus time elapsed after every 100
records. As the program continues to execute, projectedTimeRemaining( )
will return an
updated estimation of time remaining by retrieving the y-intercept from
SimpleRegression
and subtracting the
time already spent in execution. The y-intercept from SimpleRegression
represents the y value when x
equals zero, where x is the number of records remaining; as x decreases,
y increases, and y represents the total time elapsed to process all
records.
The ProcessEstimationExample
in
Example 8-2 uses the ProcessEstimator
to estimate the time
remaining while calling the performLengthyProcess( )
method 10,000
times.
Example 8-2. An example using the ProcessEstimator
package com.discursive.jccook.math.timeestimate; import org.apache.commons.lang.math.RandomUtils; public class ProcessEstimationExample { private ProcessEstimator estimate; public static void main(String[] args) { ProcessEstimationExample example = new ProcessEstimationExample( ); example.begin( ); } public void begin( ) { estimate = new ProcessEstimator( 10000, 100 ); estimate.start( ); for( int i = 0; i < 10000; i++ ) { // Print status every 1000 items printStatus(i); performLengthyProcess( ); estimate.unitCompleted( ); } estimate.stop( ); System.out.println( "Completed " + estimate.getUnits( ) + " in " + Math.round( estimate.getTimeSpent( ) / 1000 ) + " seconds." ); } private void printStatus(int i) { if( i % 1000 == 0 ) { System.out.println( "Completed: " + estimate.getCompleted( ) + " of " + estimate.getUnits( ) ); System.out.println( "\tTime Spent: " + Math.round( estimate.getTimeSpent( ) / 1000) + " sec" + ", Time Remaining: " + Math.round( estimate.projectedTimeRemaining( ) / 1000) + " sec" ); } } private void performLengthyProcess( ) { try { Thread.sleep(RandomUtils.nextInt(10)); } catch( Exception e ) {} } }
After each call to performLengthyProcess(
)
, the unitCompleted( )
method on ProcessEstimator
is
invoked. Every 100th call to unitComplete(
)
causes ProcessEstimator
to update SimpleRegression
with the
number of records remaining and the amount of time spent so far. After
every 1000th call to performLengthyProcess( )
, a status message is
printed to the console as follows:
Completed: 0 of 10000 Time Spent: 0 sec, Time Remaining: 0 sec Completed: 1000 of 10000 Time Spent: 4 sec, Time Remaining: 42 sec Completed: 2000 of 10000 Time Spent: 9 sec, Time Remaining: 38 sec Completed: 3000 of 10000 Time Spent: 14 sec, Time Remaining: 33 sec Completed: 4000 of 10000 Time Spent: 18 sec, Time Remaining: 28 sec Completed: 5000 of 10000 Time Spent: 24 sec, Time Remaining: 23 sec Completed: 6000 of 10000 Time Spent: 28 sec, Time Remaining: 19 sec Completed: 7000 of 10000 Time Spent: 33 sec, Time Remaining: 14 sec Completed: 8000 of 10000 Time Spent: 38 sec, Time Remaining: 9 sec Completed: 9000 of 10000 Time Spent: 43 sec, Time Remaining: 4 sec Completed 10000 in 47 seconds.
As shown above, the output periodically displays the amount of
time you can expect the program to continue executing. Initially, there
is no data to make a prediction with, so the ProcessEstimator
returns zero seconds, but, as
the program executes the performLengthyProcess(
)
method 10,000 times, a meaningful time remaining is
produced.
The previous example used a method that sleeps for a random number
of milliseconds between 1 and 10, and this value is selected using the
RandomUtils
class described in Recipe 8.4. It is easy to predict
how long this process is going to take because, on average, each method
call is going to sleep for five milliseconds. The ProcessEstimator
is inaccurate when the amount
of time to process each record takes a steadily increasing or decreasing
amount of time, or if there is a block of records that takes
substantially more or less time to process. If the amount of time to
process each record does not remain constant, then the relationship
between records processed and time elapsed is not linear. Because the
ProcessEstimator
uses a linear model,
SimpleRegression
, a nonconstant
execution time will produce inaccurate predictions for time remaining.
If you are using the ProcessEstimator
, make sure that it takes
roughly the same amount of time to process each record.
This recipe refers to the StopWatch
class from Commons Lang. For more
information about the StopWatch
class, see Recipe
1.19.