Discussion on Interviews Challenge

8 years ago+ 9 comments

Using CTE to group the Stats and Submission tables first is crucial, after that, its just a matter of left joining the results.

WITH SUM_View_Stats AS (
SELECT challenge_id
    , total_views = sum(total_views)
    , total_unique_views = sum(total_unique_views)
FROM View_Stats 
GROUP BY challenge_id
)
,SUM_Submission_Stats AS (
SELECT challenge_id
    , total_submissions = sum(total_submissions)
    , total_accepted_submissions = sum(total_accepted_submissions)
FROM Submission_Stats 
GROUP BY challenge_id
)

SELECT    con.contest_id
        , con.hacker_id
        , con.name
        , SUM(total_submissions)
        , sum(total_accepted_submissions)
        , sum(total_views)
        , sum(total_unique_views)
FROM Contests con
INNER JOIN Colleges col
    ON con.contest_id = col.contest_id
INNER JOIN Challenges cha
    ON cha.college_id = col.college_id
LEFT JOIN SUM_View_Stats vs
    ON vs.challenge_id = cha.challenge_id
LEFT JOIN SUM_Submission_Stats ss
    ON ss.challenge_id = cha.challenge_id
GROUP BY con.contest_id,con.hacker_id,con.name
HAVING (SUM(total_submissions)
        +sum(total_accepted_submissions)
        +sum(total_views)
        +sum(total_unique_views)) <> 0
ORDER BY con.contest_ID

8 years ago+ 1 comment

Hi, May I know why we have to group the stats and submission?

8 years ago+ 2 comments

If you look at the table, some challenge_ids s occur more than once. By using group by challenge_id the group function (sum) is only applied to those rows with the same challenge_id, thus giving the total submissions etc for the challenge with id challenge_id

8 years ago+ 1 comment

Got it! thanks!

7 years ago+ 1 comment

But my thought was the schema appears as if the challenge_id is unique and each challenge_id is associated with only one row. It would have been more intuitive if challenge_id, total_views, total_unique_views, date was provided in schema. Only when I relooked the table I was able to get it.

6 years ago+ 0 comments

Hey, so you mean to obviate the use of complex queries and making the query as a simple one..

6 years ago+ 0 comments

I know we should group challenge_id because only do this I can get the correct answer. However, I still don't understand why is that? In my opinion, even though the challenge_id have repeition, this dosen't matter because we just want the summation. For example: we have: contest1 challenge_id1 total_view = 1 contest1 challenge_id1 total_view = 2 contest1 challange_id2 total_view = 3

sum(total_views) is 1+2+3 = 6 and sum stats group by challenge_id and then sum group by contest is (1+2= 3, 3), 3+3 = 6 I think they are the same Even though we may have null values when left join, the null value will be ignore.

So why this not correct?

8 years ago+ 2 comments

Hi, may I know why it's using left join for view and submission? I use inner join and it didn't pass, when I change to left, then it passes. thanks.

8 years ago+ 1 comment

Look at the sample input in the problem statement. In the table Challenges there is a challenge_id 18765. This ID is not present in the Submission_Stats table, but is present in the View_Stats table. So, if you do the inner join on Challenges and Submissions_Stats first, there will be no rows with challenge_id 18765 in the result set. And when you do the second inner join with Vew_Stats table, the rows with challenge_id 18765 in it will not be regarded.

If you do the left join, the rows with challege_id 18765 from the Challenges table (the left table) will be preserved in the result set even if there are no rows with this id in the rgiht table.

6 years ago+ 0 comments

Awesome response, makes it so easy to understand. I had the same issue

View more Comments..

7 years ago+ 0 comments

Hello,

I ended up getting rid of the last CTE I had made and instead putting it all in the final select statement similar to yours, but at first I had the following and for some rows I got the wrong output (lower than expected), althought most were correct. Can somebody help me understanad why?

Thank you

WITH A AS (
SELECT challenge_id, SUM(total_submissions) AS total_challenge_submissions, SUM(total_accepted_submissions) AS total_challenge_accepted_submissions
FROM Submission_Stats
GROUP BY challenge_id),

B AS (
SELECT challenge_id, SUM(total_views) AS total_challenge_views, SUM(total_unique_views) AS total_challenge_unique_views
FROM View_Stats
GROUP BY challenge_id), 

/*In final solution, this CTE was deleted and logic moved down to final SELECT statement*/
C AS (
SELECT 
    hacker.hacker_id AS hacker_id, 
    hacker.name AS name, 
    college.contest_id AS contest_id, 
    college.college_id AS college_id,  
    challenge.challenge_id AS challenge_id
FROM Contests AS hacker
JOIN Colleges college ON hacker.contest_id = college.contest_id
JOIN Challenges AS challenge ON college.college_id = challenge.college_id)

SELECT
    C.contest_id, 
    C.hacker_id, 
    C.name, 
    SUM(A.total_challenge_submissions), 
    SUM(A.total_challenge_accepted_submissions), 
    SUM(B.total_challenge_views), 
    SUM(B.total_challenge_unique_views)
FROM C
LEFT JOIN B ON C.challenge_id = B.challenge_id
LEFT JOIN A ON A.challenge_id = B.challenge_id
GROUP BY C.contest_id, C.hacker_id, C.name
HAVING 
    SUM(A.total_challenge_submissions)  
    + SUM(A.total_challenge_accepted_submissions) 
    + SUM(B.total_challenge_views) 
    + SUM(B.total_challenge_unique_views) <> 0
ORDER BY C.contest_id;

7 years ago+ 0 comments

@dwightrobinson Mine is so similar to yours- can you spot the difference and please help me with why I am wrong?

` WITH view_stats_agg as ( select challenge_id ,sum(total_views) as total_views ,sum(total_unique_views) as total_unique_views from View_Stats group by challenge_id ),

submission_stats_agg as ( select challenge_id ,sum(total_submissions) as total_submissions ,sum(total_accepted_submissions) as total_accepted_submissions from Submission_Stats group by challenge_id ),

basic_info as (select con.*, col.college_id, chal.challenge_id from Contests con inner join Colleges col on con.college_id = col.college_id inner join Challenges chal on col.challenge_id = chal.challenge_id )

select contest_id ,hacker_id ,name ,sum(total_submissions) as total_submissions ,sum(total_accepted_submissions) as total_accepted_submissions ,sum(total_views) as total_views ,sum(total_unique_views) as total_unique_views

from basic_info b left join view_stats_agg v on b.challenge_id = v.challenge_id left join submission_stats_agg s on v.challenge_id = s.challenge_id group by contest_id ,hacker_id ,name

having ( sum(total_views) +sum(total_unique_views) +sum(total_submissions) +sum(total_accepted_submissions) ) <> 0

order by contest_id `

6 years ago+ 0 comments

Unfortunately it seems they don't have MySQL 8.0 and MySQL 5.7 does not have WITH, so one has to use subqueries directly, viz

SELECT ... FROM Contests c
INNER JOIN Colleges   USING (contest_id)
INNER JOIN Challenges USING (college_id)
NATURAL LEFT JOIN ( SELECT ... sum(...) as tv, ...) vv
NATURAL LEFT JOIN ( SELECT ... sum(...) as ts, ...) ss
GROUP BY c.contest_id, c.hacker_id, c.name HAVING sum(ts) > 0 OR sum(tv) > 0

where NATURAL is equivalent to USING (challenge_id), and it turns out that ORDER isn't needed since the Contests come already in order.

PS1: in MySQL 8 we could also use NATURAL INNER ... rather than of INNER ... USING(...), but this is not possible in MySQL 5.

PS2: Note that always total submissions >= accepted ones, and total views >= unique views, so it's enough to test sum(ts)>0 or sum(tv)>0 rather than the sum of all four.

[deleted] 5 years ago+ 0 comments

This won't work with HackerRank. HackerRank uses version 5. It's a bit perplexing why they havent updated by this point.

4 years ago+ 0 comments

Why we are using left join?

3 years ago+ 0 comments

Hacker rank MySQL version is 5.7, which I believe does not support CTE :(. Correct me if wrong!

View more Comments..

Cookie support is required to access HackerRank

Please signup or login in order to view this challenge