We use cookies to ensure you have the best browsing experience on our website. Please read our cookie policy for more information about how we use cookies.
If you look at the table, some challenge_ids
s occur more than once. By using group by challenge_id the group function (sum) is only applied to those rows with the same challenge_id, thus giving the total submissions etc for the challenge with id challenge_id
But my thought was the schema appears as if the challenge_id is unique and each challenge_id is associated with only one row. It would have been more intuitive if challenge_id, total_views, total_unique_views, date was provided in schema. Only when I relooked the table I was able to get it.
I know we should group challenge_id because only do this I can get the correct answer. However, I still don't understand why is that?
In my opinion, even though the challenge_id have repeition, this dosen't matter because we just want the summation.
For example:
we have:
contest1 challenge_id1 total_view = 1
contest1 challenge_id1 total_view = 2
contest1 challange_id2 total_view = 3
sum(total_views) is 1+2+3 = 6
and sum stats group by challenge_id and then sum group by contest is (1+2= 3, 3), 3+3 = 6
I think they are the same
Even though we may have null values when left join, the null value will be ignore.
Look at the sample input in the problem statement. In the table Challenges there is a challenge_id 18765. This ID is not present in the Submission_Stats table, but is present in the View_Stats table. So, if you do the inner join on Challenges and Submissions_Stats first, there will be no rows with challenge_id 18765 in the result set. And when you do the second inner join with Vew_Stats table, the rows with challenge_id 18765 in it will not be regarded.
If you do the left join, the rows with challege_id 18765 from the Challenges table (the left table) will be preserved in the result set even if there are no rows with this id in the rgiht table.
I ended up getting rid of the last CTE I had made and instead putting it all in the final select statement similar to yours, but at first I had the following and for some rows I got the wrong output (lower than expected), althought most were correct. Can somebody help me understanad why?
Thank you
WITHAAS(SELECTchallenge_id,SUM(total_submissions)AStotal_challenge_submissions,SUM(total_accepted_submissions)AStotal_challenge_accepted_submissionsFROMSubmission_StatsGROUPBYchallenge_id),BAS(SELECTchallenge_id,SUM(total_views)AStotal_challenge_views,SUM(total_unique_views)AStotal_challenge_unique_viewsFROMView_StatsGROUPBYchallenge_id),/*In final solution, this CTE was deleted and logic moved down to final SELECT statement*/CAS(SELECThacker.hacker_idAShacker_id,hacker.nameASname,college.contest_idAScontest_id,college.college_idAScollege_id,challenge.challenge_idASchallenge_idFROMContestsAShackerJOINCollegescollegeONhacker.contest_id=college.contest_idJOINChallengesASchallengeONcollege.college_id=challenge.college_id)SELECTC.contest_id,C.hacker_id,C.name,SUM(A.total_challenge_submissions),SUM(A.total_challenge_accepted_submissions),SUM(B.total_challenge_views),SUM(B.total_challenge_unique_views)FROMCLEFTJOINBONC.challenge_id=B.challenge_idLEFTJOINAONA.challenge_id=B.challenge_idGROUPBYC.contest_id,C.hacker_id,C.nameHAVINGSUM(A.total_challenge_submissions)+SUM(A.total_challenge_accepted_submissions)+SUM(B.total_challenge_views)+SUM(B.total_challenge_unique_views)<>0ORDERBYC.contest_id;
@dwightrobinson Mine is so similar to yours- can you spot the difference and please help me with why I am wrong?
`
WITH view_stats_agg as ( select challenge_id
,sum(total_views) as total_views
,sum(total_unique_views) as total_unique_views
from View_Stats
group by challenge_id
),
submission_stats_agg as ( select challenge_id
,sum(total_submissions) as total_submissions
,sum(total_accepted_submissions) as total_accepted_submissions
from Submission_Stats
group by challenge_id
),
basic_info as (select con.*, col.college_id, chal.challenge_id
from Contests con
inner join Colleges col
on con.college_id = col.college_id
inner join Challenges chal
on col.challenge_id = chal.challenge_id )
select contest_id
,hacker_id
,name
,sum(total_submissions) as total_submissions
,sum(total_accepted_submissions) as total_accepted_submissions
,sum(total_views) as total_views
,sum(total_unique_views) as total_unique_views
from basic_info b
left join view_stats_agg v
on b.challenge_id = v.challenge_id
left join submission_stats_agg s
on v.challenge_id = s.challenge_id
group by contest_id
,hacker_id
,name
having ( sum(total_views)
+sum(total_unique_views)
+sum(total_submissions)
+sum(total_accepted_submissions) ) <> 0
where NATURAL is equivalent to USING (challenge_id), and it turns out that ORDER isn't needed since the Contests come already in order.
PS1: in MySQL 8 we could also use NATURAL INNER ... rather than of INNER ... USING(...), but this is not possible in MySQL 5.
PS2: Note that always total submissions >= accepted ones, and total views >= unique views, so it's enough to test sum(ts)>0 or sum(tv)>0 rather than the sum of all four.
Interviews
You are viewing a single comment's thread. Return to all comments →
Using CTE to group the Stats and Submission tables first is crucial, after that, its just a matter of left joining the results.
Hi, May I know why we have to group the stats and submission?
If you look at the table, some
challenge_id
s s occur more than once. By usinggroup by challenge_id
the group function (sum
) is only applied to those rows with the samechallenge_id
, thus giving the total submissions etc for the challenge with idchallenge_id
Got it! thanks!
But my thought was the schema appears as if the challenge_id is unique and each challenge_id is associated with only one row. It would have been more intuitive if challenge_id, total_views, total_unique_views, date was provided in schema. Only when I relooked the table I was able to get it.
Hey, so you mean to obviate the use of complex queries and making the query as a simple one..
I know we should group challenge_id because only do this I can get the correct answer. However, I still don't understand why is that? In my opinion, even though the challenge_id have repeition, this dosen't matter because we just want the summation. For example: we have: contest1 challenge_id1 total_view = 1 contest1 challenge_id1 total_view = 2 contest1 challange_id2 total_view = 3
sum(total_views) is 1+2+3 = 6 and sum stats group by challenge_id and then sum group by contest is (1+2= 3, 3), 3+3 = 6 I think they are the same Even though we may have null values when left join, the null value will be ignore.
So why this not correct?
Hi, may I know why it's using left join for view and submission? I use inner join and it didn't pass, when I change to left, then it passes. thanks.
Look at the sample input in the problem statement. In the table Challenges there is a challenge_id 18765. This ID is not present in the Submission_Stats table, but is present in the View_Stats table. So, if you do the inner join on Challenges and Submissions_Stats first, there will be no rows with challenge_id 18765 in the result set. And when you do the second inner join with Vew_Stats table, the rows with challenge_id 18765 in it will not be regarded.
If you do the left join, the rows with challege_id 18765 from the Challenges table (the left table) will be preserved in the result set even if there are no rows with this id in the rgiht table.
Awesome response, makes it so easy to understand. I had the same issue
Hello,
I ended up getting rid of the last CTE I had made and instead putting it all in the final select statement similar to yours, but at first I had the following and for some rows I got the wrong output (lower than expected), althought most were correct. Can somebody help me understanad why?
Thank you
@dwightrobinson Mine is so similar to yours- can you spot the difference and please help me with why I am wrong?
` WITH view_stats_agg as ( select challenge_id ,sum(total_views) as total_views ,sum(total_unique_views) as total_unique_views from View_Stats group by challenge_id ),
submission_stats_agg as ( select challenge_id ,sum(total_submissions) as total_submissions ,sum(total_accepted_submissions) as total_accepted_submissions from Submission_Stats group by challenge_id ),
basic_info as (select con.*, col.college_id, chal.challenge_id from Contests con inner join Colleges col on con.college_id = col.college_id inner join Challenges chal on col.challenge_id = chal.challenge_id )
select contest_id ,hacker_id ,name ,sum(total_submissions) as total_submissions ,sum(total_accepted_submissions) as total_accepted_submissions ,sum(total_views) as total_views ,sum(total_unique_views) as total_unique_views
from basic_info b left join view_stats_agg v on b.challenge_id = v.challenge_id left join submission_stats_agg s on v.challenge_id = s.challenge_id group by contest_id ,hacker_id ,name
having ( sum(total_views) +sum(total_unique_views) +sum(total_submissions) +sum(total_accepted_submissions) ) <> 0
order by contest_id `
Unfortunately it seems they don't have MySQL 8.0 and MySQL 5.7 does not have WITH, so one has to use subqueries directly, viz
where
NATURAL
is equivalent toUSING (challenge_id)
, and it turns out thatORDER
isn't needed since the Contests come already in order.PS1: in MySQL 8 we could also use
NATURAL INNER ...
rather than ofINNER ... USING(...)
, but this is not possible in MySQL 5.PS2: Note that always total submissions >= accepted ones, and total views >= unique views, so it's enough to test sum(ts)>0 or sum(tv)>0 rather than the sum of all four.
This won't work with HackerRank. HackerRank uses version 5. It's a bit perplexing why they havent updated by this point.
Why we are using left join?
Hacker rank MySQL version is 5.7, which I believe does not support CTE :(. Correct me if wrong!