• + 9 comments

    Using CTE to group the Stats and Submission tables first is crucial, after that, its just a matter of left joining the results.

    WITH SUM_View_Stats AS (
    SELECT challenge_id
        , total_views = sum(total_views)
        , total_unique_views = sum(total_unique_views)
    FROM View_Stats 
    GROUP BY challenge_id
    )
    ,SUM_Submission_Stats AS (
    SELECT challenge_id
        , total_submissions = sum(total_submissions)
        , total_accepted_submissions = sum(total_accepted_submissions)
    FROM Submission_Stats 
    GROUP BY challenge_id
    )
    
    SELECT    con.contest_id
            , con.hacker_id
            , con.name
            , SUM(total_submissions)
            , sum(total_accepted_submissions)
            , sum(total_views)
            , sum(total_unique_views)
    FROM Contests con
    INNER JOIN Colleges col
        ON con.contest_id = col.contest_id
    INNER JOIN Challenges cha
        ON cha.college_id = col.college_id
    LEFT JOIN SUM_View_Stats vs
        ON vs.challenge_id = cha.challenge_id
    LEFT JOIN SUM_Submission_Stats ss
        ON ss.challenge_id = cha.challenge_id
    GROUP BY con.contest_id,con.hacker_id,con.name
    HAVING (SUM(total_submissions)
            +sum(total_accepted_submissions)
            +sum(total_views)
            +sum(total_unique_views)) <> 0
    ORDER BY con.contest_ID
    
    • + 1 comment

      Hi, May I know why we have to group the stats and submission?

      • + 2 comments

        If you look at the table, some challenge_ids s occur more than once. By using group by challenge_id the group function (sum) is only applied to those rows with the same challenge_id, thus giving the total submissions etc for the challenge with id challenge_id

        • + 1 comment

          Got it! thanks!

          • + 1 comment

            But my thought was the schema appears as if the challenge_id is unique and each challenge_id is associated with only one row. It would have been more intuitive if challenge_id, total_views, total_unique_views, date was provided in schema. Only when I relooked the table I was able to get it.

            • + 0 comments

              Hey, so you mean to obviate the use of complex queries and making the query as a simple one..

        • + 0 comments

          I know we should group challenge_id because only do this I can get the correct answer. However, I still don't understand why is that? In my opinion, even though the challenge_id have repeition, this dosen't matter because we just want the summation. For example: we have: contest1 challenge_id1 total_view = 1 contest1 challenge_id1 total_view = 2 contest1 challange_id2 total_view = 3

          sum(total_views) is 1+2+3 = 6 and sum stats group by challenge_id and then sum group by contest is (1+2= 3, 3), 3+3 = 6 I think they are the same Even though we may have null values when left join, the null value will be ignore.

          So why this not correct?

    • + 2 comments

      Hi, may I know why it's using left join for view and submission? I use inner join and it didn't pass, when I change to left, then it passes. thanks.

      • + 1 comment

        Look at the sample input in the problem statement. In the table Challenges there is a challenge_id 18765. This ID is not present in the Submission_Stats table, but is present in the View_Stats table. So, if you do the inner join on Challenges and Submissions_Stats first, there will be no rows with challenge_id 18765 in the result set. And when you do the second inner join with Vew_Stats table, the rows with challenge_id 18765 in it will not be regarded.

        If you do the left join, the rows with challege_id 18765 from the Challenges table (the left table) will be preserved in the result set even if there are no rows with this id in the rgiht table.

        • + 0 comments

          Awesome response, makes it so easy to understand. I had the same issue

    • + 0 comments

      Hello,

      I ended up getting rid of the last CTE I had made and instead putting it all in the final select statement similar to yours, but at first I had the following and for some rows I got the wrong output (lower than expected), althought most were correct. Can somebody help me understanad why?

      Thank you

      WITH A AS (
      SELECT challenge_id, SUM(total_submissions) AS total_challenge_submissions, SUM(total_accepted_submissions) AS total_challenge_accepted_submissions
      FROM Submission_Stats
      GROUP BY challenge_id),
      
      B AS (
      SELECT challenge_id, SUM(total_views) AS total_challenge_views, SUM(total_unique_views) AS total_challenge_unique_views
      FROM View_Stats
      GROUP BY challenge_id), 
      
      /*In final solution, this CTE was deleted and logic moved down to final SELECT statement*/
      C AS (
      SELECT 
          hacker.hacker_id AS hacker_id, 
          hacker.name AS name, 
          college.contest_id AS contest_id, 
          college.college_id AS college_id,  
          challenge.challenge_id AS challenge_id
      FROM Contests AS hacker
      JOIN Colleges college ON hacker.contest_id = college.contest_id
      JOIN Challenges AS challenge ON college.college_id = challenge.college_id)
      
      SELECT
          C.contest_id, 
          C.hacker_id, 
          C.name, 
          SUM(A.total_challenge_submissions), 
          SUM(A.total_challenge_accepted_submissions), 
          SUM(B.total_challenge_views), 
          SUM(B.total_challenge_unique_views)
      FROM C
      LEFT JOIN B ON C.challenge_id = B.challenge_id
      LEFT JOIN A ON A.challenge_id = B.challenge_id
      GROUP BY C.contest_id, C.hacker_id, C.name
      HAVING 
          SUM(A.total_challenge_submissions)  
          + SUM(A.total_challenge_accepted_submissions) 
          + SUM(B.total_challenge_views) 
          + SUM(B.total_challenge_unique_views) <> 0
      ORDER BY C.contest_id; 
      
    • + 0 comments

      @dwightrobinson Mine is so similar to yours- can you spot the difference and please help me with why I am wrong?

      ` WITH view_stats_agg as ( select challenge_id ,sum(total_views) as total_views ,sum(total_unique_views) as total_unique_views from View_Stats group by challenge_id ),

      submission_stats_agg as ( select challenge_id ,sum(total_submissions) as total_submissions ,sum(total_accepted_submissions) as total_accepted_submissions from Submission_Stats group by challenge_id ),

      basic_info as (select con.*, col.college_id, chal.challenge_id from Contests con inner join Colleges col on con.college_id = col.college_id inner join Challenges chal on col.challenge_id = chal.challenge_id )

      select contest_id ,hacker_id ,name ,sum(total_submissions) as total_submissions ,sum(total_accepted_submissions) as total_accepted_submissions ,sum(total_views) as total_views ,sum(total_unique_views) as total_unique_views

      from basic_info b left join view_stats_agg v on b.challenge_id = v.challenge_id left join submission_stats_agg s on v.challenge_id = s.challenge_id group by contest_id ,hacker_id ,name

      having ( sum(total_views) +sum(total_unique_views) +sum(total_submissions) +sum(total_accepted_submissions) ) <> 0

      order by contest_id `

    • + 0 comments

      Unfortunately it seems they don't have MySQL 8.0 and MySQL 5.7 does not have WITH, so one has to use subqueries directly, viz

      SELECT ... FROM Contests c
      INNER JOIN Colleges   USING (contest_id)
      INNER JOIN Challenges USING (college_id)
      NATURAL LEFT JOIN ( SELECT ... sum(...) as tv, ...) vv
      NATURAL LEFT JOIN ( SELECT ... sum(...) as ts, ...) ss
      GROUP BY c.contest_id, c.hacker_id, c.name HAVING sum(ts) > 0 OR sum(tv) > 0
      

      where NATURAL is equivalent to USING (challenge_id), and it turns out that ORDER isn't needed since the Contests come already in order.

      PS1: in MySQL 8 we could also use NATURAL INNER ... rather than of INNER ... USING(...), but this is not possible in MySQL 5.

      PS2: Note that always total submissions >= accepted ones, and total views >= unique views, so it's enough to test sum(ts)>0 or sum(tv)>0 rather than the sum of all four.

    • [deleted]
      + 0 comments

      This won't work with HackerRank. HackerRank uses version 5. It's a bit perplexing why they havent updated by this point.

    • + 0 comments

      Why we are using left join?

    • + 0 comments

      Hacker rank MySQL version is 5.7, which I believe does not support CTE :(. Correct me if wrong!