AI Spots ‘Cheating’ in Hiring Exams! What is HireRoo's New "Suspiciousness Score" Feature?
AccountHow AI's "Suspiciousness Score" Feature Works and How to Use It
HireRoo has released an AI-powered "Suspiciousness Score" feature as a new tool to accurately assess candidate skills. This article explains the background behind the feature's development, the mechanism of its judgment (the philosophy behind the algorithm), and specific ways to utilize it in the hiring process.
Background and Challenges of Development
In the past, hiring managers have visualized candidate behavior through data such as keystroke playback, number of departures (from the test interface), and number of pastes. However, detecting cheating (such as web searches or copy-pasting from generative AI) required engineers to meticulously review reports, leading to the following challenges:
- Heavy Confirmation Workload: It takes a significant amount of time to thoroughly check every report.
- Difficulty in Evaluation: If generative AI or similar tools are skillfully used, it becomes difficult to accurately assess skills.
The newly introduced "Suspiciousness Score" feature aims to solve these challenges, supporting hiring managers in making more efficient and accurate decisions.
Judgment Mechanism and Design Philosophy
This feature uses over 40,000 data points accumulated in HireRoo, with learning data manually classified by engineers for potential cheating. The AI analyzes behavioral logs during the exam and automatically determines the likelihood of cheating.
Designed to Emphasize "Recall"
This AI model is designed to emphasize "Recall." This means that its policy is to "detect suspicious signs without fail."
Design Intent This is similar to the approach in a health checkup: "If there's even a slight suspicion of abnormality, further examination is recommended." Therefore, there may be cases where responses that are not actually fraudulent are judged as "suspicious" (false positives). However, this specification is intended to minimize "misses" (false negatives) to nearly zero.
Risk Avoidance Perspective
Mistakes in hiring (hiring someone who shouldn't be hired) can lead to significant losses for an organization. Our research estimates that a hiring mismatch incurs costs of over 5 million yen per person and has negative impacts on the organization.
To minimize this risk, the judgment criteria are deliberately set strictly, and the operation is premised on "if the suspiciousness score is high, a human will conduct a detailed check using playback, etc."
How to Use the Suspiciousness Score by Objective
Utilize the feature as follows, depending on the hiring phase:
1. Assessment (Detailed Evaluation) Phase
For reports flagged as suspicious, prioritize reviewing playback and departure counts to ascertain the validity of the skills. If suspicious behavior is confirmed, it is effective to ask probing questions during the interview, such as:
- "Please explain the logic of this implementation."
- "Are there any other solutions you can think of?"
2. Screening Phase
- No Suspiciousness: As the possibility of cheating is low, proceed with evaluation based primarily on metrics such as score and rank.
- Suspiciousness Present: You can either individually review the report or, if you cannot tolerate cheating risk, make a decision to disqualify the candidate.
Improving Accuracy and Providing Feedback
The suspiciousness score is under continuous improvement. If a report is marked "suspicious" but upon review you find no suspicious points, please use the feedback function to report it. Your feedback will contribute to improving the accuracy of future judgments.
Future Outlook
Currently, the feature is only applied to algorithm-based problems, but the following updates are planned:
- Improved Explainability: Adding a feature to specifically indicate "where and how it is suspicious."
- Expanded Target Formats: Applying to other test formats such as quiz and system design questions.
HireRoo will continue to strive for feature improvements to accurately assess candidate skills and prevent mismatches.