Day 6: Multiple Linear Regression: Predicting House Prices
-
alexey_filippov 9 years ago R is an amazing language indeed: I struggle to write a concatenation, but linear model is so easy!
lm
, and you're done! And there's even more!Still, even reading
stdin
is a pain. On the plus side, all the stats stuff just works.-
AffineStructure 9 years ago Writing this in R was frustrating. Some things I learned on my way:
1. How read-in data with readLines and to use strsplit to put the data neatly into dataframes
2. how to extract the coefficients from lm to use with multiplying with the dataframes
3. I struggled with the question how do you properly input the lm without fixed number of vectors, since I had problems with the datatype. I realized finally that we can take a matrix as an input so we had to do something like lm(df1[length(df1)] ~ as.matrix(df1[-(length(df1))]))
4. how do you print something like this properly since R makes an annoying index at the beginning of all of your prints. I used a for loop and the cat function with a line break. hacker rank recieved the answer nicely that way.-
BryanRJ 9 years ago Or you can do "lm(price ~ ., data=foo)", which will create a linear regression model having price as the independent variable and all other columns in foo as dependent variables.
Re: printing, try "write(foo, stdout())", which will be a non-pretty print.
-
AffineStructure 9 years ago when you did "lm(price ~ ., data=foo)" did you have the data saved in data frames?
I was having trouble doing it this way because price needs to be assigned to the last vector. did you just rename the final entry to price in the data frame or did you just have all the vectors floating free in the global environment?
-
-
alexey_filippov 9 years ago On
strsplit
, actuallyscan
does the same out of box:nums <- suppressWarnings(readLines(file("stdin"))) fn <- scan(text=nums[1]) f <- fn[1] n <- fn[2]
On the coefficients, there's no need to extract anything: as soon as you've got the fitted model, you can use
predict
to apply it to a data frame of queries.On printing, StackOverflow users suggest something along the lines of,
write.table(cat(format(answer, nsmall=1), sep="\n"), sep = "", append=T, row.names = F, col.names = F)
Frankly, I don't really understand how
cat
andwrite.table
interact here, but it seems to work just fine.
-
-
andreymir 9 years ago Easy to read data with
read.delim(file="stdin", header = FALSE, sep = " ")
-
prlpzb 9 years ago I agree that using read.delim o read.table is a lot easier than reading lines and then trying to build vectors and frames. The trick is to read the whole imput as a data frame an then slice it to get the problem data.
Example:
tot=read.table(file="stdin",header=FALSE,fill=TRUE,sep=" ")
f=tot$V1[1]
n=tot$V2[1]
origen=tot[2:(n+1),]
t=tot$V1[n+2]
desti=tot[(n+3):(n+2+t),]
-
-
-
AlejandroBlanco 9 years ago 49.13... What a pain!
My results on the test case are:
82.28545033210533
159.9594001121738
138.99344089799777
117.35990799068198
with a score of 0.965.
I was wondering if someone was having similar results. I used Java by the way. Do you think the double type rounding could be responsible? There could be something wrong in the calculus, but I think is straight forward. If there's a problem, the whole code would crash and wouldn't give me an approximate result.
-
andreymir 9 years ago Trying to solve this in R and the answer I'm getting for the sample data is close:
105.214558351069 142.670951307299 132.936054691247 129.701754045025
and the sample output is:
105.22 142.68 132.94 129.71
But it is not presise, so I'm not getting the full score although other two tests marked as passed when I submit this solution. But it is rated with 30.38 scores. I'm using the
lm
function to calcuate coefficients. -
nicocai 9 years ago test case #2 always over timed, while others just fine. Any tips?
-
geminigal 9 years ago The next problem starts when the contest ends. Can that be correct?
Sort 7 Discussions, By:
Please Log In in order to post a comment