With the rise of AI tools helping candidates game online hiring assessments and interviews, hiring teams are increasingly losing trust in the integrity of these evaluations. But before we blame AI for everything, let’s pause and unpack an essential nuance of integrity.
“Integrity in hiring is not so much about a candidate using AI or not. It is about whether they followed the rules or not.”
Each company has its hiring philosophy, but at its core, integrity ensures that:
- The candidate is who they claim to be.
- They follow the test guidelines they agreed to.
- Everyone is evaluated fairly, no matter the hiring approach.
With the rise of AI, software development workflows are evolving to make AI an integral part of developers’ toolchains. To keep up with this change, hiring teams must balance two critical aspects:
- Fundamentals should still be assessed in isolation to evaluate foundational knowledge and core problem-solving skills.
- AI-assisted workflows should be assessed separately, focusing on how well the candidates integrate AI tools in real-world scenarios.
Most assessments today focus on assessing fundamentals, often restricting AI usage to ensure a clear, unbiased skill signal. But with more “interview-helper” AI tools emerging, it’s getting harder to prevent their unauthorized usage. Some of these tools market themselves as “invisible” AI copilots. While these new-age tools give the shivers to even the hiring teams at tech giants, we at HackerRank haven’t lost our sleep because this is deeply ingrained in how we function, and we have been thinking about it from the beginning. And here’s why.
Over the years, we’ve built our Integrity stack to combat unfair online assessment and interview practices. But how well does it hold up against the latest “invisible” tools designed to bypass detection? To find out, we put it to the test.
Here’s a look at how we conducted our testing and what we discovered.
Selecting the tool
For this experiment, we tested InterviewCoder, one of the tools marketed as “invisible interview co-pilots.” It uses semi-transparent overlays to avoid appearing in screen recordings or screen-sharing sessions. Candidates can upload screenshots of coding problems and get AI-generated answers, complete with comments and explanations.
We evaluated it in two environments:
- HackerRank Test (Screen)
- Live interviews (HackerRank Interview)
Test methodology for Screen
- Test setup – InterviewCoder was tested on the HackerRank-certified “Backend Developer Hiring Test.” The test had the following three questions –
- 1 Basic coding Problem-solving question
- 1 Intermediate coding REST API question
- 1 Intermediate SQL question
- Integrity settings for the test – The test was taken with the new Proctor mode enabled. (Here’s a sneak peek of the Proctor mode, which will be available as an early release to select customers starting April 2025.) The following integrity settings were turned on for the test. These settings are turned on by default and are required for Proctor mode to be enabled.
Setting | Value |
Tab proctoring | On |
Copy-paste detection | On |
AI plagiarism detection | On |
Image proctoring | Off |
- Attempting the test – The test was attempted without any effort from the candidate to solve the problem; the candidate relied solely on the tool for answers. The test was attempted on a 16” Macbook Pro (16GB Apple M2 Pro).
- Taking help from the tool – We used the tool to generate solutions to the problem by uploading the problem screenshot. If the solutions did not work, we used the debug option available in the tool or resubmitted the problem to generate a new solution.
- Submitting the test – The test was submitted upon completion of all the problems the tool best solved. When the tool couldn’t solve a problem, the submitted answer remained unchanged.
Test methodology for interviews (HackerRank Interview)
- We tested the usage of InterviewCoder in HackerRank Interview with the following question types.
- Easy and hard coding questions
- HTML/CSS/Javascript
- Projects-type questions (Angular, React, React Native, DevOps)
- Android (Kotlin, Java)
- Interview integrity settings were enabled for the interviews.
- The candidate was asked to share fullscreen during the interviews using HackerRank Screenshare.
- Interviewee candidates were selected to ensure they were not naturally proficient in the problems being asked and would primarily rely on the tool for help.
- We tested Interview on two operating systems – Mac OS Sonoma 14.4.1 and Windows 10.
Findings
Test-taking experience
- Our experience with InterviewCoder was disappointing. It had many UX issues, including navigation, clunky usability, accessibility challenges, and other basic functionalities. Some of its features are broken or just don’t work. It seems more like a “quick hack” than an actual product.
- One of the biggest frustrations was its unreliable performance. The tool frequently crashed, throwing HTTP 500 errors many times while attempting to generate solutions.
Screen test results
- Our AI plagiarism detection successfully flagged the candidate for plagiarism in all three questions. It flagged the attempt as “highly suspicious,” with a confidence score above 0.99.
- The final test score was only 25%, with the tool correctly solving only the first question (Coding—Problem Solving(Basic)).
- The tool struggled to solve “Intermediate” coding and SQL questions. Despite multiple attempts, it didn’t generate the correct answer. We finally had to give up and submit the test without solving it, so these two questions received zero scores.
- The AI plagiarism service correctly flagged question one (Problem-solving Basic) for code similarity and reference taking.
- The plagiarism detection also flagged questions two and three as highly suspicious, with a confidence score of 0.993 and 0.999, respectively. The report did not explicitly flag them because our backend logic avoids flagging questions where the candidate scored zero. If any test case had passed for those questions, they would have been flagged in the report.
Attached below is the screenshot of the report.
- Code playback pinpointed the exact moment the candidate referenced an external solution.
- The most unique feature of HackerRank Proctor mode is session replay, which captures a screenshot of the candidate using an external tool, providing clear, undeniable evidence of plagiarism.
- The tool is not invisible when the candidate is taking the test on MacOS.
Interview test results
- When the candidates were on MacOS, the InterviewCoder tool was clearly visible to interviewers through the HackerRank screenshare feature.
- On Windows, the tool did not appear in screen sharing, but any mouse interaction triggered an “Out of Interview” alert, which was automatically detected by our integrity monitoring system.
- Since live interviews go beyond just submitting the correct code, our interviewers identified several suspicious behaviors, including long pauses before responding, frequent glances at the screen, and reading silently while attempting to answer a question.
- Because these candidates weren’t proficient in the problems they were solving, they struggled with follow-up questions from interviewers.
- When the candidates’ solution (referenced from the tool) did not work, they were forced to generate a new solution using the tool, discarding their previous solution. Even after being asked by the interviewer, they could not debug the existing solution or explain why they had changed their approach entirely.
Our secret sauce – HackerRank Integrity stack
The current integrity stack effectively handles “invisible” tools and other threats. Let’s examine its key components.
- AI plagiarism detection
- AI plagiarism detection is the core foundation of our stack. This sophisticated system uses machine learning to detect and flag instances of plagiarism. It considers multiple input signals from the candidate’s system and can spot even the most subtle signs of plagiarism that would otherwise be hard for a human reviewer to spot.
- To stay ahead of evolving tactics, the system is continuously trained on new datasets.
- “The unique aspect of our plagiarism detection is its ability to flag potential referencing by the candidate (not just copy-pasting), regardless of the external source, be it Stack Overflow, InterviewCoder, or even ChatGPT running on a phone!”
- Our system has already been extensively tested and proven effective at detecting LLM-generated answers, including those from ChatGPT
-
- Photo ID verification and Image analysis
- To address the first pillar of integrity, verifying candidate identity, we have Photo ID Verification and Image Analysis. These features help hiring teams confirm that the person taking the test is the actual candidate, preventing identity fraud and proxy test-taking.
- Proctor mode
- The Proctor mode (planned to be in limited availability only by April 2025) takes integrity to the next level with real-time monitoring. Think of it as an AI-powered proctor who observes candidate behavior throughout the assessment, tracks suspicious activity, and provides hiring teams with a detailed integrity report and the ability to replay the entire candidate session.
- Secure browser
- The “HackerRank Secure browser” (planned to be in limited availability only by May 2025) will be a native application that the candidates must download and install on their systems to take the test.
- It would provide the highest level of control of the candidate’s system. This dedicated testing environment shall lock down the assessment window, preventing access to unauthorized tools and applications. Creating a fully controlled test environment would make it exponentially harder for candidates to game the system.
- It would be one of the most effective ways to stop these invisible tools. It can specifically block and terminate applications running on the candidate’s system.
Together, these components form a robust, multi-layered defense that evolves alongside new threats. And we are not stopping there. We’re constantly working on new enhancements to make our Integrity Stack even stronger.
Roadmap for further improving integrity capabilities
- Our AI plagiarism detection is already highly effective, but we’re continuously refining it. Moving forward, we’ll enhance it by training our models on specific behavioral patterns associated with using AI-assisted cheating tools.
- While the session replay feature can capture and showcase the usage of tools like InterviewCoder on macOS, in the subsequent release of our Proctor mode, we will even automatically analyze these screenshots and notify the hiring teams about the usage of such tools in the reports without requiring any manual intervention from their side.
- We are also exploring the possibility of applying the integrity features available in Screen, such as AI plagiarism detection, to HackerRank Interview. This will involve training a dedicated AI model for interviews.
- Additionally, we’re evaluating how the HackerRank Secure Browser could be used for interviews and assessments, allowing us to gain even more control over the candidate’s system and block unauthorized tools in real-time.
Recommendations for Hiring Teams
- If you’re not already doing so, use HackerRank to conduct online assessments and interviews.
- Enable AI plagiarism detection for all tests.
- For early talent assessments (focused on basic coding skills), we recommend conducting them using either Proctor Mode or the HackerRank Secure Browser (post-release).
- Use real-world problems to assess candidates to get the right skill signals. We have Project-based questions and AI interviewer assessments in that category.
- Look beyond just correct answers; as our testing shows, candidates who rely on AI-generated solutions struggle to explain their thought processes. Focus on follow-up questions to gauge true understanding.
- Even without plagiarism detection, interviewers can assess comprehension based on how well a candidate explains their solution. Our tools provide added visibility to flag concerns.
Final thoughts
From a candidate’s perspective, tools like InterviewCoder are not very reliable and are pretty expensive. From a hiring standpoint, our current and planned integrity stack of AI plagiarism detection, Proctor mode, and Secure Browser is a robust and formidable defense against such tools. Our Interviews Integrity features are also quite effective. However, we will continue to improve our Integrity stack further to prevent and detect such tools.
“Invisible” AI tools may evolve, but the real skills remain visible to the right systems and interviewers. At HackerRank, we’re continuing to evaluate these tools, strengthen our detection methods, and build assessments that reflect the reality of modern development.
To reiterate our belief, Integrity in assessments is not whether you use AI or not; it’s if you follow the rules of the game. We will constantly evolve our integrity stack to ensure the rules are followed, and if they are not followed, flag them to give everyone a fair chance.
We recognize that the hiring industry is evolving to determine where to allow the use of AI. The hiring process needs to evolve to evaluate assessing fundamentals without AI and the ability to use AI. That’s the true “next-gen” developer hiring process; we are committed to working on that. More on our “next-gen” hiring soon.