Day 6: Multiple Linear Regression: Predicting House Prices

  • alexey_filippov 9 years ago + 2 comments

    R is an amazing language indeed: I struggle to write a concatenation, but linear model is so easy! lm, and you're done! And there's even more!

    Still, even reading stdin is a pain. On the plus side, all the stats stuff just works.

    Add Reply Preview cancel

    Sorry, you do not have a permission to answer to this question.

    • AffineStructure 9 years ago + 2 comments

      Writing this in R was frustrating. Some things I learned on my way:

      1. How read-in data with readLines and to use strsplit to put the data neatly into dataframes

      2. how to extract the coefficients from lm to use with multiplying with the dataframes

      3. I struggled with the question how do you properly input the lm without fixed number of vectors, since I had problems with the datatype. I realized finally that we can take a matrix as an input so we had to do something like lm(df1[length(df1)] ~ as.matrix(df1[-(length(df1))]))

      4. how do you print something like this properly since R makes an annoying index at the beginning of all of your prints. I used a for loop and the cat function with a line break. hacker rank recieved the answer nicely that way.

      Add Reply Preview cancel

      Sorry, you do not have a permission to answer to this question.

      • BryanRJ 9 years ago + 1 comment

        Or you can do "lm(price ~ ., data=foo)", which will create a linear regression model having price as the independent variable and all other columns in foo as dependent variables.

        Re: printing, try "write(foo, stdout())", which will be a non-pretty print.

        Add Reply Preview cancel

        Sorry, you do not have a permission to answer to this question.

        • AffineStructure 9 years ago + 1 comment

          when you did "lm(price ~ ., data=foo)" did you have the data saved in data frames?
          I was having trouble doing it this way because price needs to be assigned to the last vector. did you just rename the final entry to price in the data frame or did you just have all the vectors floating free in the global environment?

          Add Reply Preview cancel

          Sorry, you do not have a permission to answer to this question.

          • BryanRJ 9 years ago + 0 comments

            The "data=foo" argument there tells R that data are supposed to come from the frame named "foo".

            I always recommend building a data frame before performing regression or other modeling. It makes things easier in R.

            Add Reply Preview cancel

            Sorry, you do not have a permission to answer to this question.

        • alexey_filippov 9 years ago + 0 comments

          On strsplit, actually scan does the same out of box:

          nums <- suppressWarnings(readLines(file("stdin")))
          
          fn <- scan(text=nums[1])
          f <- fn[1]
          n <- fn[2]
          

          On the coefficients, there's no need to extract anything: as soon as you've got the fitted model, you can use predict to apply it to a data frame of queries.

          On printing, StackOverflow users suggest something along the lines of,

          write.table(cat(format(answer, nsmall=1), sep="\n"), sep = "", append=T, row.names = F, col.names = F)
          

          Frankly, I don't really understand how cat and write.table interact here, but it seems to work just fine.

          Add Reply Preview cancel

          Sorry, you do not have a permission to answer to this question.

        • andreymir 9 years ago + 1 comment

          Easy to read data with read.delim(file="stdin", header = FALSE, sep = " ")

          Add Reply Preview cancel

          Sorry, you do not have a permission to answer to this question.

          • prlpzb 9 years ago + 0 comments

            I agree that using read.delim o read.table is a lot easier than reading lines and then trying to build vectors and frames. The trick is to read the whole imput as a data frame an then slice it to get the problem data.

            Example:

            tot=read.table(file="stdin",header=FALSE,fill=TRUE,sep=" ")

            f=tot$V1[1]

            n=tot$V2[1]

            origen=tot[2:(n+1),]

            t=tot$V1[n+2]

            desti=tot[(n+3):(n+2+t),]

            Add Reply Preview cancel

            Sorry, you do not have a permission to answer to this question.

        1. Challenge Walkthrough
          Let's walk through this sample challenge and explore the features of the code editor.1 of 6
        2. Review the problem statement
          Each challenge has a problem statement that includes sample inputs and outputs. Some challenges include additional information to help you out.2 of 6
        3. Choose a language
          Select the language you wish to use to solve this challenge.3 of 6
        4. Enter your code
          Code your solution in our custom editor or code in your own environment and upload your solution as a file.4 of 6
        5. Test your code
          You can compile your code and test it for errors and accuracy before submitting.5 of 6
        6. Submit to see results
          When you're ready, submit your solution! Remember, you can go back and refine your code anytime.6 of 6
        1. Check your score