Our objective was to investigate AI-generated advice in HR decision-making. Advice provision strongly influenced participants' behavior, and different advice characteristics influenced participants' decisions to varying degrees.
First of all, if you look at it, source of advice, we found that novices performed slightly better when receiving human advice. Furthermore, when comparing novices and experts, we observed that the difference between the percentage of participants who followed the human's correct advice and the percentage of participants who followed the AI's correct advice was larger for the novices. However, this tendency to dislike algorithms disappeared when controlling for covariates and did not emerge for more experienced participants. Overall, our results show little evidence of algorithmic evaluation or aversion. These findings are somewhat at odds with other studies and indicate that people with less task expertise may value algorithms more highly.15or that the task expert shows an aversion to the algorithm.17. However, this finding contributes to research showing that algorithmic aversion is task dependent, with more subjective tasks resulting in less reliance on the algorithm.37. Although personnel selection should ideally be objective, unavoidable subjective tendencies in the process may contribute to the observed results.
Secondly, Accuracy of advice played an important role throughout all experiments. When given the wrong advice, participants' performance consistently fell below the level exhibited by those who did not receive the advice. When receiving correct advice, participants' performance improved slightly compared to baseline levels, supporting our hypothesis. Participants who followed the wrong advice rated the quality as similar to the correct advice. This shows that you didn't realize that the advice was wrong and relied on it too much. Conversely, participants who overruled inaccurate advice rated its quality lower because they recognized its inaccuracy and decided to actively ignore it. Despite receiving inaccurate advice across all experiments, overall advice quality and participants rated their confidence lower (consistent with previous findings)38), participants were still unable to ignore inaccurate advice and ultimately relied on it. Across all experiments, participants followed correct and incorrect advice in more than two-thirds of their decisions. About half of their decisions followed correct advice, and about one-tenth of them followed incorrect advice.Other studies have found similar effects17,19shows that people often follow advice regardless of whether it is correct or not.9, 10, 16. Participants' tendency to over-rely on advice may be due to it serving as an anchor for decision-making. In such cases, participants will accept the AI ​​advice without considering contradictory information because they will focus their attention on the aspects that are consistent with the advice.12. Research has shown that adjusting one's decisions based on an anchor (in this case, advice) occurs independently of prior judgments about the advice and is unintentional.Ten. This independent information processing may explain why people rely on incorrect advice even though their overall quality is rated low and their confidence in their decisions is low.
Third, I tested the following: explainability Explaining how AI systems make predictions can reduce over-reliance on incorrect advice to positively impact decision-making. Contrary to our hypothesis, manipulating explainability did not significantly reduce overreliance on inaccurate advice or improve overall performance. Contrary to the common belief that explainability is a key element in improving interactions between AI systems and humans.6,24,39, this study shows little evidence of that. Receiving explanations about the model's predictions can have a small effect on users' quality perceptions (Experiment 2b) and confidence (Experiment 2c) in AI-generated advice, but these effects are not consistent. did not. , there was no improvement in task performance.Previous research has shown that heatmaps can be too abstract or complex40Performance may be affected.Processing the additional information may have required the same or more cognitive resources than receiving no explanation41. Additionally, recent research has demonstrated that providing explanations along with AI advice increases task complexity.42, which can exacerbate cognitive load and negate the expected benefits of explanatory aids. Cognitive load refers to the limited mental resources an individual has to process information. When cognitive load is high, people tend to process information superficially, prioritizing easily accessible data.43. Therefore, the combination of time pressure and complex explanations may have prevented the expected positive effect of reducing overdependence. We removed the time limit to allow participants to focus more on the explanation and facilitate understanding. Explainable advice in the form of a salient heatmap increased people's tendency to rely on correct advice when participants did not feel time pressure (see Figure S2), but it still increased the tendency for participants to rely on correct advice (see Figure S2). Reliance on accurate advice did not decrease significantly. These findings are consistent with other studies, suggesting that people are more likely to use a given explanation when making a decision if it can reduce cognitive cost (e.g., if the explanation is simple, easy to understand, and requires fewer cognitive resources to process). This shows that it can be successfully incorporated into44.
Surprisingly, performance did not improve when using simpler explanatory techniques to see if complexity contributed to the limited effect of explainability on reducing overdependence, but the advice had a subtle impact on quality ratings and participant trust. In Experiment 2c, explainable advice had a positive impact on quality ratings only for correct advice, but increased confidence only for inaccurate advice. Initial results indicate that people may be more aware of incorrect advice when they receive a simpler explanation. Conversely, being able to detect inaccurate advice more easily may increase participants' confidence in their decision to override inaccurate advice. These results contradict research that suggests that explanations increase reliance on AI's incorrect advice.45 AI advice becomes more reliable, even with lower accuracy29. In our experiments, explainable AI advice did not increase overdependence, but it also did not lead to improved performance. Overall, explainable AI advice in the form of salient heatmaps only increases people's reliance on correct advice compared to unexplainable AI advice presented without time constraints. was.
Tailoring a description to a specific user can be the first step in reducing the complexity of a given description42 Therefore, it improves the interaction with AI systems. Customizing explanations for specific users goes hand-in-hand with using explanations that are easier to understand, which may result in lower cognitive costs.44. However, more research is needed to fully understand how and specifically how explainable AI advice impacts people's decisions.
Overall, the results of this study support our research questions regarding sources of advice and our hypotheses regarding advice accuracy, but fall short of fully supporting our hypotheses regarding explainable advice. These findings indicate that people are more likely to follow advice regardless of the type of advice presented. While relying on high-performance AI decision support systems may lead to better hiring decisions overall, research shows that NLP models can perform talent selection tasks as well as humans. Masu.46, none of these systems are 100% accurate. Making AI advice more explainable had no significant impact on performance. This has therefore raised the important question of how we can prevent over-reliance on incorrect AI advice, especially for critical tasks such as talent selection. AI systems for personnel selection will be considered 'high risk' and must be subject to strict regulations, according to new EU regulations.47. Since the EU framework won't apply until 2024 at the earliest, and standards are still being developed, implementing high-quality, safe AI decision support tools in HRM and other high-risk areas will require a lot of effort. It is essential to explore solutions that reduce over-reliance on AI advice.
practical meaning
The practical implications arising from this study are multifaceted. First, given the consistent findings in this study and many others that people trust even inaccurate advice (e.g.10,17), the quality of advice will be a key factor to consider when establishing robust regulations and standards. Regulatory frameworks such as those currently being developed by the EU47,To ensure secure implementation, the impact of advice quality on user,interaction with these systems must be considered. For example, one approach is to display AI advice only when a certain threshold of system certainty is exceeded, potentially reducing the risk of incorrect advice. Second, from a user's perspective, to reduce the risk of relying on advice given, careful consideration should be given to how instructions are presented and the level of involvement expected from users.Given the limited effectiveness of explanation methods observed in our study, encouraging users to approach explanations analytically may help avoid blind acceptance of advice. there is48.
Limitations and future research
The current study had several limitations that provide opportunities for future research. First, participants recognized that making inaccurate decisions does not have negative real-life consequences. Therefore, their motivation to perform well may have been limited. Future research could address this issue by including short justifications for decisions to ensure more honest task performance. Second, although short review times were chosen to ensure external validity, in practice recruiters typically act more autonomously when reviewing resumes. Even if infinite review time did not significantly affect the results, further research should investigate the effect of time pressure. Third, in practice, suitability is usually based on a holistic view of the applicant, so recruiters have more discretion when considering selection criteria. However, to obtain clear and comparable performance measurements in our experiments, we had to apply strict rules. Finally, some effects were no longer significant after adjusting for covariates. This suggests that these effects are not stable and may vary depending on study characteristics such as methodology. Future research should examine these effects in more detail, as it is important to understand the robustness of the effects.