Sort 2 Discussions, By:

Sorry, you do not have a permission to answer to this question.

  • alexis779 9 years ago + 0 comments

    What are the tricks to get a score close to 1?

    For example, max & min temperatures are correlated so univariate interpolation is not enough. Besides doing 2d interpolation how do you improve the estimation ?

    I got 0.83 score using Python with:

    • interpolate.SmoothBivariateSpline from scipy to perform 2d b-spline interpolation with kx=3 for the time and ky=1 for the min temperature, when interpolating max temperature.
    • preprocessing.StandardScaler from sklearn to standardize the features

    See the improvement of 2d interpolation over 1d interpolation https://drive.google.com/file/d/0BxG_aXny3IUpRnVzWlhzYXhEV1U/view?usp=sharing

    Add Reply Preview cancel

    Sorry, you do not have a permission to answer to this question.

    • kits228 9 years ago + 2 comments

      Hello guys .. Could anyone suggest approach to this solution ? From the resources tab it seems we need to use interpolation. But I didnt get that much info from the post. Could anyone explain me in detail ?

      Add Reply Preview cancel

      Sorry, you do not have a permission to answer to this question.

      • diogojapinto Asked to answer 9 years ago + 0 comments

        I performed a linear regression over the datapoints, based on the index (I'll explain what I mean next), and polinomials of degree up to 3 of the non-missing feature for each point in which a feature is missing.

        You just have to be careful when determining the index. I used Pandas library (in Python) to work with dates. There, I found that there was no data for a couple of months, so be careful with that.

        Hope this helps a little bit ;)

        Add Reply Preview cancel

        Sorry, you do not have a permission to answer to this question.

        • AffineStructure 9 years ago + 1 comment

          https://en.wikipedia.org/wiki/Spline_interpolation

          Add Reply Preview cancel

          Sorry, you do not have a permission to answer to this question.

          • luizacc 9 years ago + 0 comments

            Great tip! I ran into a strange problem. I was using difference of months from the first data (that is, the first samples started as 0,tmin,tmax; the next was 1,tmin',tmax',etc...).

            For 20 samples, I could calculate the missing temps. However, with the 400+ samples, the calculation resulted in "nan"...

            Because I thought that my line of thought was correct, I decided, wout much hope, to change the scale to days, instead of months... that is, the first samples started as 0,tmin,tmax; the next was 30,tmin',tmax',etc..) Voila! I could calculate again the missing temps, and my code was accepted.

            The problem is I cannot totally explain why increasing the "time scale" resulted in non "nan"... Could someone explain to me why it worked? Thanks

            Add Reply Preview cancel

            Sorry, you do not have a permission to answer to this question.

        1. Challenge Walkthrough
          Let's walk through this sample challenge and explore the features of the code editor.1 of 6
        2. Review the problem statement
          Each challenge has a problem statement that includes sample inputs and outputs. Some challenges include additional information to help you out.2 of 6
        3. Choose a language
          Select the language you wish to use to solve this challenge.3 of 6
        4. Enter your code
          Code your solution in our custom editor or code in your own environment and upload your solution as a file.4 of 6
        5. Test your code
          You can compile your code and test it for errors and accuracy before submitting.5 of 6
        6. Submit to see results
          When you're ready, submit your solution! Remember, you can go back and refine your code anytime.6 of 6
        1. Check your score