We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
- Prepare
- Artificial Intelligence
- Probability & Statistics - Foundations
- Day 6: Correlation and Regression Lines #1
- Discussions
import math
byx = -3/4 bxy = -3/4
result = math.sqrt(byx*bxy)
print(round(-1*result, 2))
for every time that we have a positive correlation coefficient, the slope of the regression line is positive.
in the formula a = r(sy/sx) since sy and sx will always be +ve a will have same sign as corr. coeff.
A useful piece of information is the definition of b_xy—this is the slope of the regression line resulting from regressing x on y.
I found Wikipedia's "geometric interpretation" to be helpful here, i.e.
r² = 1/cos(delta) - tan(delta)
wheredelta
is the angle between the given regression lines. https://en.wikipedia.org/wiki/Pearson_product-moment_correlation_coefficient#Geometric_interpretation(Delta is easy to calculate from the
y = f(x) = ax + b
form of both regression lines, asa = tan(alpha)
where alpha is the angle of the function w.r.t. the x axis)Personal pitfalls: forgetting to get the square root, forgetting to consider the sign (+ or -) of the result.
The way to use simulation is adding information to the problem. I think the key point to this question is to understand the math behind linear regression and the Peason correlation. From the problem statement, we know that y= -3/4*x-2 +e (1); and x=-3/4*y -7/4 +e (2). The linear regression coefficient beta is Sxy/Sxx ( in y=beta*x+e format), where S regresents (summation of the difference.....please check the linear regression lecture notes). So, applying the regular beta formular to our problem, we get Sxy/Sxx= -3/4 from (1) ......(3); and Syx/Syy=-3/4 from (2) ........(4); Recall that the Peason corelation is: r=Sxy/(sqrt(Sxx)*sqrt(Syy)) Here Sxy=Syx. So, if we multiply (3) and (4) and sqrt the result, we get: sqrt(Sxy*Syx/(Sxx*Syy))=r=+ or - 3/4. Since y and x are negtive corellated based on the negative beta, we get the r = -0.75.
Please excuse me if there are some types.
thanks, it was helpful
why Sxy = Syx ?
I don't see why this isn't 0.96, I get the angle as 16.26 with a cos of 0.96.
Yeah, actually I don't have a clue of correlation of regression. cosine similarity is not the case perhaps, but...
You're assuming that the data has been centered.
So, this code gives -0.68, which is wrong.
#!/usr/bin/python import random,scipy.stats
x=[] y=[] for _ in range(1000000): z=random.random() x.append(z) y.append((-3*z-8)/4.0) for _ in range(1000000): z=random.random() x.append(z) y.append((-4*z-7)/3.0)
a,b = scipy.stats.pearsonr(x,y) print('%.2f'%(a))
Eh, so, I was able to solve this problem.
When scipy.stats.linregress(x,y)[0] and scipy.stats.linregress(y,x)[0] are both -0.75, answer scipy.stats.pearsonr(x,y)[0].
The problem statement is too difficult to understand...
For centered data the correlation coefficient is the cosine of the angle, but for un-centered data, it's the secant minus the tangent. [1]
I get the wrong sign when I do it that way, though.