Discussion on Mr. X and His Shots Challenge

2 years ago+ 0 comments

First off, don't worry if you don't know cricket. The fundamental problem here deals with range-overlap, and the two sets (shots & fielding-ranges) are equivalent data structures. Choose one range (shots or fielding arbitrary, which one). This will be a bunch of starting and end data, likely with a lot of overlap. Let's say that we're dealing with the "shots" data. We can give each "shot" range a score based on the number of "fielding" ranges that it overlaps with. Our final score would be the sum of all of the "shot" scores. (You can start with the fielding-ranges and compare those against the shots as well; whichever way you choose will yield the same score at the end.

Similar to previous challenges, we can aggregate our data by incrementing indices each time a range starts at that index, and de-incrementing the position after the final position in a range. We can process through this data structure and keep track of how many ranges are active at any given time by keeping a running sum of increments and deincrements, as ranges start and stop, and for this purpose, we are interested in any position where the NET number of active ranges changes.

We will need TWO of these data-structures. One for the "shots" range-data and the other for the "players" range-data.

For SCORING, we are interested in any position where a range begins, even if the net number of active zones doesn't change. Why? Imagine that we are evaluating "shots", and in a given shot's range, there are three corresponding fielding ranges which run consecutively, such that each time that one fielding range stops, the next picks up at the next index. The total number of active ranges remains the same throughout, but our shot gets credit for all three.

So we evaluate scoring at a given index if one, or the other, or both data-structures have a range that starts at that index; If there is a drop-off, we still need to update our rolling count of active ranges, but our score can only increase if either a "shots" range or a "player" range begins at the index.

Scoring is a little tricky. Every time that a range begins in either data-structure, we give it credit for the number of active ranges in the opposing data-structure. But what if BOTH shots and fielding data-structures have ranges that begin at the same index? Here, it's important to seperate out the new increments from the ranges that were already active. So say we had 3 existing "shots" ranges, and at index "i" 2 new ones come online, and one drops off, bringing the total number of active ranges to (3 + 2 - 1 = 4). At that same index, we had 2 existing player-ranges, had 1 come on and 1 drop off, leaving us with the same number of player ranges (2 + 1 - 1 = 2). The 2 new shot ranges increment the score by the number of previously existing player ranges that are still active (1), so we increment our score by 2 to account for this contribution. The 2 new shot ranges also increment the score by the number of new player ranges that have come online (1), so we increment our score again by 2. So we see that to account for adding two new ranges in our "shots" data-structure we have incremented our score by (number of shots increments * number of active player ranges = 2 * 2 = 4), which we have to do any time we increment an index in shots, but what about the new range in player-shots? We've accounted for the new additions in shots, but what the two pre-existing ranges that didn't drop off? We have to give them credit for the new additions in the player-range data, so we add another (2 * 1 = 2), meaning that in total we've added 6 to our score.

Clear as mud? Yeah, probably, but maybe this helped somebody.

As far as implementation: Since we have to use something that can hold both the "increment" and "deincrement" data at each index, I found it useful to use a TreeSet with "roll your own"some nodes that hold the required data anlong with an index for comparative sorting. That allowed me to only maintain data positions of interest instead of instantiating an array to hold every position, and also helped while iterating through the data-structure as gaps in the index-set could be skipped over. I would anticipate the "fuller" the index-set becomes, the more this implentation (O(log(n) seek) will be less efficient than an Array-backed structure with a constant seek time, since as your index-set approaches "n", you're going to be evaluating every index anyway.

Cookie support is required to access HackerRank