Correlation measures the degree of interdependence (association) between two variables. If two variables are so related that an increase or decrease of one is found in connection with increase or decrease of the other, then the two variables are said to be correlated. Here it is important to note that there might be a similar movement between two variables such as automobile sales and demand for shoes. But these two variables have no connection due to which the calculation for these two variables is wrong because it does not make any sense. Therefore care must be taken that the two variables have some connection before a calculation can make sense.
The correlation coefficient gives a mathematical value for measuring the strength of the linear relationship between two variables.
r lies between -1 and +1
+1 indicates perfect positive relation
-1 indicates perfect negative relation
0 shows no correlation
Calculation of Correlation Coefficient
The formula for calculating linear correlation coefficient is called product-moment formula presented by Karl Pearson. Therefore it is also called Pearsonian coefficient of correlation. The formula is given as:
Note: Correlation is the geometric mean of absolute values of two regression coefficients i.e.
Scatter Diagrams for different degrees of correlation
Coefficient of Determination and Non Determination
Coefficient of determination shows the percentage of variance in a variable (say y) which is associated with the variance in other variable (say x). It is calculated by taking the square of correlation coefficient (r) and is expressed in terms of percentage. Suppose r = 0.40 then r square will be 0.16. Now the value of r square indicates that 16% of variation in variable y is explained by variable x.
The coefficient of non determination (1- R square) indicates the amount of variance in one variable or the other which is independent of changes in second variable. For example in above case the coefficient of non determination would be 1- 0.16 = 0.84. Thus it means that 84% of variance in variable y is not explained by variable x.
Probable error is calculated to guard against false conclusions based on the calculation of coefficient of correlation. Since in majority of statistical investigations it is impossible to evaluate all the items therefore conclusions are based on a sample. The size of this sample has great influence on the results of analysis. For example in case of small sample size, it is very likely to end up with wrong conclusions. It is therefore necessary to calculate probable error to avoid any error related to the sample size during the calculation of correlation.
The formula for calculating probable error is given as:
There is no correlation between two variables if the coefficient of correlation r is less than the P.E.
Correlation exists between two variables if the coefficient of correlation r is more than P.E. However if r is less than 0.20, then the correlation is not appreciable.
The correlation is highly significant if r is more than 6 times the size of P.E
Limits of correlation are r ± P.E
Problem: A researcher wants to know the relation between advertisement expenditure and total sales. For this purpose he took a sample data of 7 companies for one year. The data is given below in the table. Find the correlation coefficient and interpret your result.
Since r = 0.910 > P.E and r is also greater than 6P.E. Therefore there is high positive correlation between advertising expenditure and annual sales. The limits of correlation are from 0.87 to 0.95. The value of r square = 0.8281 which shows that 83% of variance in x is associated with variation in y or vice versa.