• + 0 comments

    Not exactly slower, if we look at the constants behind O-s. Say, we sort events, there are 4e5 of them, so we get log(4e5)*4e5/log(2)=7.44e6, the difference became not so drastic. And sort at any step uses conditional operations that, when incorrectely predicted, break the CPU conveyer. That costs about 10 ticks each time. And they would be incorrectely predicted in about 50% cases, that is expensive. The proposed approach is straightforward, the only conditional is "if ( max < x ) max = x" that probably would be saturated fast, that is in most cases it would not be taken, and CPU would be able to predict that.
    About numbers: my sorting approach runs in 0.01-0.02s, this approach in my implementation runs in 0.03-0.04s on strong testcases. That is probably because we have to use quite a lot of (random access) memory, 80MB instead of 3.2MB in the sorting approach, and it leads to cache misses and TLB trashing. HugeTLB could help a bit, had it been turned on on the testing server.