Simple Linear Regression And Correlation
Table of Contents
Q.4.3) Explain the concept of regression and correlation, and describe the main properties of Karl-Pearson’s coefficient of correlation.
Answer:
Regression:
Regression is the relationship between one dependent and other independent variable (s). If there is only one independent variable, the regression is said to be simple or linear otherwise the regression is said to be multiple. For example, the yield of wheat is depend upon the fertilizer, seeds, water etc. In this example, the yield of wheat is dependent variable, whit fertilizer; seeds and water are independent variables. The values of independent variables are assumed to be fixed, that is, it is not a random variable. On the other hand, the dependent variable, whose values are determined based on the independent variable, is a random variable.
Correlation:
Correlation is the interdependence between two or more variables. I correlation, it is assumed that all the variables are random variables. The correlation between two variables is said to be simple or linear, otherwise, the correlation is said to be multiple. Two variables are said to be correlated when an increase or decrease in one variable corresponds to an increase (decrease in the other variable. If either the variables increase or decrease the correlation will be positive and if they go in opposite direction the correlation will be negative.
Read more: KPK G12 Statistics Notes Chapter 2 Sampling Distributions
Properties of Karl-Pearson correlation coefficient.
- It is symmetrical with respect to variables. That is rxy = ryx
- It is independent from the origin and scale. That is ruv =rvu
- It is a number, which lies between ± 1.
- It is the geometric mean of two regression coefficients.
Q.4.4) What is a scatter diagram. Explain the meaning of the regression of Y on X and X on Y.
Answer:
To plot the bivariate data on the graph paper is called a scatter diagram with the help of a scatter diagram the relationship between two variables studying. Values for the independent variable are shown on the horizontal axis and the corresponding values of the dependent variable are shown by vertical axis. The scatter diagram provides an over view of the data and enables us to draw a preliminary conclusion about a possible relationship between t variables.
Regression line of y on x: – this is a line, which may give the best possible mean values of y for given values of. It describes the variation in the values of y for a given variation in the values of x. In algebraic notation the equation of regression of y on x is Y= a+ b X.
Regression line of x on y: – this is a line, which may give the best possible mean values of “x” corresponding to variation in the values of “y”. i.e. X = a +b Y.
Q.4.5) What is meant by least-squares method. Describe briefly the properties of the least-squares regression line.
Answer:
The method of least square is an objective and efficient method of determining the best fitting curve. The least square method provides an estimated regression equation that minimizes the sum of squares deviations between the observed values of the dependent variable and the estimated values of the dependent variable. This method is used to find the straight line that provides the best approximation for the relationship between the independent and dependent variables.
PROPERTIES OF LEAST SQUARE REGRESSION LINE:
The sum of squares of the deviations from the regression line is minimum.
The regression line always passes through the points the means of the variables.
The sum of the deviations of the actual values from the regression line is always equal to zero. That is ∑(Y – Ŷ) = 0.
Q.4.6) Differentiate between regression and correlation, giving examples.
Answer:
Regression:
Regression is the relationship between one dependent and other independent variable (s). It provides an estimate to be used for estimating or predicting the average value of the dependent variable from the known values of the independent variable. In regression, the dependent variable is assumed a random variable whereas the independent variables are assumed to have fixed values, i.e.; they are chosen non-randomly. For example, the yield of wheat is depending upon the fertilizer etc.
Correlation:
Correlation is a measure of the degree to which any two variables vary together. For example, the length of an iron bar will increase as the temperature increase. The main difference in correlation and regression is that in correlation both the variables are random but in regression, one is random and the other non-random.
Q.4.7) Explain with suitable examples the concept of correlation coefficient between two variables.
Answer:
Correlation is measured by its coefficient “r” derived by Karl-Pearson. The value of “r” lies between ± 1. Where the magnitude of r indicates the strength of a linear relationship while its sign indicates the direction. If r = +1, it indicates perfect positive correlation. If r = -1, it indicates perfect negative correlation. If r = 0, it indicates no linear relationship or no correlation.
Read more: KPK Class 12 Statistics Chapter1 (Normal Distribution)
Q.4.8) Given the bivariate data:
X 1 5 3 2 1 1 7 3
Y 6 1 0 0 1 2 1 5
Find the regression line of Y on X and hence predict Y. if X=10.
SOLUTION:
X | Y | XY | X2 |
1 | 6 | 6 | 1 |
5 | 1 | 5 | 25 |
3 | 0 | 0 | 9 |
2 | 0 | 0 | 4 |
1 | 1 | 1 | 1 |
1 | 2 | 2 | 1 |
7 | 1 | 7 | 49 |
3 | 5 | 15 | 9 |
23 | 16 | 36 | 99 |
The regression line y on x
ŷ=a+bx
Where:
b=n∑xy -∑x∑y / n∑x2 -∑(x)2
b=8(36) -23*16 / 8*99 – (23)2
b=288 – 368 / 792 – 529
b=-80/263
b=-0.30
NOW:
a=∑y – b∑x / n
a=16 – (-0.30) (23) / 8
a=16+6.9/8
a=22.9/8
a=2.86
Now putting the values of a and b in the line:
Hence
ŷ=a+bx
ŷ=2.86 -0.30 x
If x=10 then
ŷ=2.86 -0.30 (10)
ŷ=2.86 -3
ŷ=-0.14 answer