Regression Line
If our data shows a linear relationship between and , then the straight line which best describes the relationship is the regression line. The regression line is given by .
Finding the Value of
The value of can be calculated using either of the following formulae:
- , where is the Pearson correlation coefficient, is the standard deviation of and is the standard deviation of .
Finding the Value of
, where is the mean of and is the mean of .
Sums of Squares
- :
- :
- :
If SSE is small, we can assume that our fit is good.
Coefficient of Determination (R-squared)
multiplied by gives the percent of variation attributed to the linear regression between and .
Example
Let's consider following data sets:
So,-
Now we can compute the values of and :And,So, the regression line is .
Linear Regression in R
We can use the lm function to fit a linear model.
x = c(1, 2, 3, 4, 5)
y = c(2, 1, 4, 3, 5)
m = lm(y ~ x)
summary(m)
Running the above code produces the following output:
Call:
lm(formula = y ~ x)
Residuals:
1 2 3 4 5
0.6 -1.2 1.0 -0.8 0.4
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.6000 1.1489 0.522 0.638
x 0.8000 0.3464 2.309 0.104
Residual standard error: 1.095 on 3 degrees of freedom
Multiple R-squared: 0.64, Adjusted R-squared: 0.52
F-statistic: 5.333 on 1 and 3 DF, p-value: 0.1041
If we want information for coefficients only, we can use the following code:
x = c(1, 2, 3, 4, 5)
y = c(2, 1, 4, 3, 5)
lm(y ~ x)
Running the above code produces the following output:
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
0.6 0.8
Linear Regression in Python
We can use the fit function in the sklearn.linear_model.LinearRegression class.
from sklearn import linear_model
import numpy as np
xl = [1, 2, 3, 4, 5]
x = np.asarray(xl).reshape(-1, 1)
y = [2, 1, 4, 3, 5]
lm = linear_model.LinearRegression()
lm.fit(x, y)
print(lm.intercept_)
print(lm.coef_[0])
Running the above code produces the following output:
0.6
0.8