You need to establish a relationship between two independent variables. These variables could be temperature versus energy use or the number of news channels versus stress-related ailments; you need to measure the correlation between two variables.
Add data points to an instance of Commons Math SimpleRegression
. This class will calculate
the slope, slope confidence, and a measure of relatedness known as
R-square. The SimpleRegression
class
performs a least squares regression with one independent variable;
adding data points to this model refines parameters to the equation
y = ax + b
. The following code uses
SimpleRegression
to find a
relationship between two series of values [0,
1, 2, 3, 4, 5]
and [0, 1.2, 2.6,
3.2, 4, 5]
:
import orgorg.apache.commons.math.stat.multivariate.SimpleRegression; SimpleRegression sr = new SimpleRegression( ); // Add data points sr.addData( 0, 0 ); sr.addData( 1, 1.2 ); sr.addData( 2, 2.6 ); sr.addData( 3, 3.2 ); sr.addData( 4, 4 ); sr.addData( 5, 5 ); // Print the value of y when line intersects the y axis System.out.println( "Intercept: " + sr.getIntercept( ) ); // Print the number of data points System.out.println( "N: " + sr.getN( ) ); // Print the Slope and the Slop Confidence System.out.println( "Slope: " + sr.getSlope( ) ); System.out.println( "Slope Confidence: " + sr.getSlopeConfidenceInterval( ) ); // Print RSquare a measure of relatedness System.out.println( "RSquare: " + sr.getRSquare( ) );
This example passes six data points to SimpleRegression
and prints the slope, number
of data points, and R-square from SimpleRegression
:
Intercept: 0.238 N: 6 Slope: 0.971 Slope Confidence: 0.169 RSquare: 0.985
R-square is the square of something called the Pearson's product
moment correlation coefficient, which can be obtained by
calling getR( )
on SimpleRegression
. R-square is a determination of correlation between two
series of numbers. The parameters to the addData()
method of SimpleRegression
are a corresponding x and y
value in two sets of data. If R-square is 1.0, the model shows that as x
increases linearly, y increases linearly. In the previous example,
R-square is 0.98, and this demonstrates that the (x,y) data points added
to SimpleRegression
have a strong
linear relationship.
If R-square is -1.0, x increases linearly as y decreases linearly. A value of 0.0 shows that the relationship between x and y is not linear. The following example demonstrates two series of numbers with no relationship:
import org.apache.commons.math.stat.multivariate.SimpleRegression; SimpleRegression sr = new SimpleRegression( ); sr.addData( 400, 100 ); sr.addData( 300, 105 ); sr.addData( 350, 70 ); sr.addData( 200, 50 ); sr.addData( 150, 300 ); sr.addData( 50, 500 ); // Print RSquare a measure of relatedness System.out.println( "RSquare: " + sr.getRSquare( ) );
The data points added to this SimpleRegression
are all over the map; x and y
are unrelated, and the R-square value for this set of data points is
very close to zero:
Intercept: 77.736 N: 12 Slope: 0.142 Slope Confidence: 0.699 RSquare: 0.02
The (x,y) data points supplied to the previous example have no linear correlation. This doesn't prove that there is no relationship between x and y, but it does prove that the relationship is not linear.
For more information about least squares, the technique used by
SimpleRegression
, see Wikipedia
(http://en.wikipedia.org/wiki/Least_squares).
More information about R and R-square can also be found on Wikipedia
(http://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient).