Day 7: Temperature Predictions
-
alexis779 9 years ago What are the tricks to get a score close to 1?
For example, max & min temperatures are correlated so univariate interpolation is not enough. Besides doing 2d interpolation how do you improve the estimation ?
I got 0.83 score using Python with:
interpolate.SmoothBivariateSpline
from scipy to perform 2d b-spline interpolation with kx=3 for the time and ky=1 for the min temperature, when interpolating max temperature.preprocessing.StandardScaler
from sklearn to standardize the features
See the improvement of 2d interpolation over 1d interpolation https://drive.google.com/file/d/0BxG_aXny3IUpRnVzWlhzYXhEV1U/view?usp=sharing
-
kits228 9 years ago Hello guys .. Could anyone suggest approach to this solution ? From the resources tab it seems we need to use interpolation. But I didnt get that much info from the post. Could anyone explain me in detail ?
-
diogojapinto Asked to answer 9 years ago I performed a linear regression over the datapoints, based on the index (I'll explain what I mean next), and polinomials of degree up to 3 of the non-missing feature for each point in which a feature is missing.
You just have to be careful when determining the index. I used Pandas library (in Python) to work with dates. There, I found that there was no data for a couple of months, so be careful with that.
Hope this helps a little bit ;)
-
-
luizacc 9 years ago Great tip! I ran into a strange problem. I was using difference of months from the first data (that is, the first samples started as 0,tmin,tmax; the next was 1,tmin',tmax',etc...).
For 20 samples, I could calculate the missing temps. However, with the 400+ samples, the calculation resulted in "nan"...
Because I thought that my line of thought was correct, I decided, wout much hope, to change the scale to days, instead of months... that is, the first samples started as 0,tmin,tmax; the next was 30,tmin',tmax',etc..) Voila! I could calculate again the missing temps, and my code was accepted.
The problem is I cannot totally explain why increasing the "time scale" resulted in non "nan"... Could someone explain to me why it worked? Thanks
-
-
Sort 2 Discussions, By:
Please Log In in order to post a comment