Day 8: Least Square Regression Line

Regression Line

If our data shows a linear relationship between and , then the straight line which best describes the relationship is the regression line. The regression line is given by .

Finding the Value of

The value of can be calculated using either of the following formulae:

, where is the Pearson correlation coefficient, is the standard deviation of and is the standard deviation of .

Finding the Value of

, where is the mean of and is the mean of .

Sums of Squares

:
:
:

If SSE is small, we can assume that our fit is good.

Coefficient of Determination (R-squared)

multiplied by gives the percent of variation attributed to the linear regression between and .

Example

Let's consider following data sets:

So,
Now we can compute the values of and :
And,
So, the regression line is .

Linear Regression in R

We can use the lm function to fit a linear model.

x = c(1, 2, 3, 4, 5)
y = c(2, 1, 4, 3, 5)

m = lm(y ~ x)
summary(m)

Running the above code produces the following output:

Call:
lm(formula = y ~ x)

Residuals:
   1    2    3    4    5 
 0.6 -1.2  1.0 -0.8  0.4 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)   0.6000     1.1489   0.522    0.638
x             0.8000     0.3464   2.309    0.104

Residual standard error: 1.095 on 3 degrees of freedom
Multiple R-squared:   0.64,	Adjusted R-squared:   0.52 
F-statistic: 5.333 on 1 and 3 DF,  p-value: 0.1041

If we want information for coefficients only, we can use the following code:

x = c(1, 2, 3, 4, 5)
y = c(2, 1, 4, 3, 5)

lm(y ~ x)

Running the above code produces the following output:

Call:
lm(formula = y ~ x)

Coefficients:
(Intercept)            x  
        0.6          0.8

Linear Regression in Python

We can use the fit function in the sklearn.linear_model.LinearRegression class.

from sklearn import linear_model
import numpy as np
xl = [1, 2, 3, 4, 5]
x = np.asarray(xl).reshape(-1, 1)
y = [2, 1, 4, 3, 5]
lm = linear_model.LinearRegression()
lm.fit(x, y)
print(lm.intercept_)
print(lm.coef_[0])

Running the above code produces the following output:

0.6
0.8

Regression Line

Finding the Value of

Finding the Value of

Sums of Squares

Coefficient of Determination (R-squared)

Example

Linear Regression in R

Linear Regression in Python

Cookie support is required to access HackerRank