When the algorithm gets it wrong, managers look away

While AI can take the bias out of performance reviews, research shows managers simply ignore it when it delivers a rating they do not want

In 2021, Bloomberg reported that Amazon delivery drivers were receiving automated termination emails, fired by an algorithm and with no human manager involved. The system had watched them constantly, tracking speed, idle time, seatbelt use and customer interactions, and issued performance ratings that ended careers without a human ever reviewing the decision.

At IBM, the approach was different in kind but similar in principle. In 2018, Bloomberg reported that IBM’s Watson system assessed employee productivity and forecast future performance, with those outputs feeding directly into decisions about who received a pay rise and who did not.

These were not pilot programmes or experiments. They were live and at scale, representing a growing reality for millions of workers worldwide. The promise behind them was the same in both cases: replace the subjectivity and relationship-driven bias of human judgement with something more consistent, more objective, and ultimately fairer. It was a compelling argument.

humphreys-kerry-0002.jpg
UNSW Business School Professor Kerry Humphreys said even a well-designed system risks becoming another layer that appears objective, but can quietly reproduce bias in the performance evaluation process. Photo: UNSW Sydney

These cases pointed to an underlying behavioural question: “When given the choice, do managers defer to algorithmic judgment when evaluating employee performance, or does our human judgment prevail?” said Kerry Humphreys, a Professor in the School of Accounting, Auditing and Taxation at UNSW Business School. “While algorithms may be intended to reduce subjectivity and potential bias in human judgments, their effectiveness depends not only on their design, but on whether people are willing to follow their advice.”

How algorithmic bias enters through the side door

New research suggests that when managers were given a choice about whether to follow the algorithm's recommendations, the human instinct to protect workplace relationships quietly reasserted itself, and the bias the algorithm was meant to eliminate came back through the side door.

The paper, Tough Ratings, Tougher Sell: How Different Types of Adjustment Affect Managers’ Asymmetric Algorithm Use in Performance Evaluation Judgments, was authored by Dr Fangbin Lin from The University of Western Australia, and UNSW Business School Professors Mandy Cheng and Kerry Humphreys.

Subscribe to BusinessThink for the latest research, analysis and insights from UNSW Business School

Published in The Accounting Review, the study used a controlled experiment with 242 experienced managers, each of whom was asked to evaluate the performance of a hypothetical employee using an algorithm that produced a customer relationship score. Participants were randomly assigned to conditions where the algorithm recommended either a high or a low rating, and where they were given different levels of control over the algorithm. Importantly, there was no adjustment of the final score or adjustment of the underlying calculation process.

The results were stark. When the algorithm recommended a high rating, approximately 60% of managers used it. When it advised a low rating, that figure fell to around 42%. The algorithm was the same; only the outcome changed. Supplementary analysis confirmed that managers’ reluctance to use the low-rating algorithm was associated with their concern about damaging their relationship with the employee, rather than any considered view that the algorithm was technically flawed.

This finding matters because it reveals a paradox at the heart of algorithmic performance management. Organisations invest in these tools to reduce bias. But if managers selectively use algorithms only when the results are favourable, the bias does not disappear. Rather, it is essentially laundered through the tool. The algorithm’s presence can actually make the leniency problem worse by providing a veneer of objectivity to what remains a subjective, relationship-driven process.

"If AI appears to be used consistently, but managers are quietly avoiding tougher calls to keep relationships intact, it has real consequences for people’s careers"

KERRY HUMPHREYS

“If AI appears to be used consistently, but managers are quietly avoiding tougher calls to keep relationships intact, it has real consequences for people’s careers,” Prof. Humphreys explained. “These ratings aren’t just paperwork – they decide who receives bonuses and promotions, and who receives developmental support needed to improve. When underperformance is glossed over, employees can drift along under a supportive manager, only for the reality to catch up with them when a new manager steps in.”

The problem with letting managers choose

When managers have the option to use an algorithmic performance rating (or ignore it and rely on their own judgement), they do not treat the algorithm neutrally. They treat it as a convenient tool when it tells them what they want to hear, and an inconvenient one when it does not.

This tendency, known as leniency bias, is well established in the management and accounting literature. Research has long shown that most managers avoid giving low performance ratings. The reasons are practical and human: nobody wants to justify a poor rating to an unhappy employee, manage the fallout, or damage a working relationship they depend on. Studies have found that between 60 and 70% of employees typically receive ratings in the top two performance levels, regardless of actual performance distribution.

What has been less understood until now is whether the presence of an algorithm changes this behaviour. The short answer, according to this research, is that it does not – unless organisations design their systems carefully.

If managers cannot engage with how an algorithm reaches its conclusions, they will abandon it when it matters most.jpeg
If managers cannot engage with how an algorithm reaches its conclusions, they will abandon it when it matters most: when it identifies underperformance that needs addressing. Photo: Adobe Stock

Why tweaking the final score does not help

One obvious response to this problem is to allow managers to adjust the algorithm’s output – to nudge a score up or down by a small margin. This gives managers a sense of control and, as prior research in other settings, such as sales forecasting, suggests, increases their willingness to use algorithmic tools.

In the performance evaluation context, the research found that this approach did not work. Managers who were allowed to adjust the final score were no more likely to use the algorithm when it gave a low rating than managers who had no adjustment rights at all. Simply letting someone modify a number they disagree with does not make them trust the process that produced it.

The mechanism here is important. When an algorithm gives a result a manager dislikes, that manager starts asking questions. Why did it come up with this? What data went in? Does it understand the nuances of this particular employee? Allowing them to change the final number answers none of those questions. They still do not understand how the algorithm works, and so they still distrust it.

The fix: Let managers inside the algorithm

What did work was giving managers the ability to adjust the algorithm’s computation process itself. In the experiment, this meant changing the relative weight the algorithm assigned to different components of customer feedback. This is what the researchers call “process adjustment” as distinct from “output adjustment”.

Learn more: Measuring what matters in the new performance frontier

When managers in the low-rating condition were given this type of control, their willingness to use the algorithm rose to 63%, which is essentially equal to the rate at which managers used it when it gave a high rating. The leniency gap disappeared.

The reason, the researchers found, was that engaging with the algorithm’s calculation process gave managers a sense of understanding how it worked. When they could see the inputs, adjust the weighting, and observe how their changes affected the output, they felt the algorithm’s reasoning was more like their own. That perception of understanding drove trust, and trust drove use.

Critically, this effect occurred only when the algorithm assigned a low rating. When the algorithm gave a high rating, managers accepted it at face value, regardless of what type of adjustment rights they had. There was no motivation to scrutinise a result they were happy with. This asymmetry is consistent with motivated reasoning, which is a well-documented tendency for people to apply more critical thinking to conclusions they do not like.

What this means for organisations deploying AI in HR

The implications for organisations building or buying algorithmic performance tools are significant.

Designing for process transparency should be a priority, not an afterthought. If managers cannot engage with how an algorithm reaches its conclusions, they will abandon it when it matters most, which is when it identifies underperformance that needs addressing. The researchers note that the growing emphasis on “explainable AI” in the technology sector is well-aligned with this finding, and that allowing limited process adjustment may be viable even for more complex algorithms.

Training, interface design, and transparency are the key levers organisations should pull.jpeg
Training, interface design, and transparency are the key levers organisations should pull to improve managers' usage of algorithmic performance management tools. Photo: Adobe Stock

Allowing managers to adjust only the final rating is likely to be ineffective as a design choice in performance evaluation settings, and the research suggests it may actually provide false reassurance to organisations that believe they have addressed the problem of managerial override.

The leniency problem does not go away with algorithms alone. Even where managers have access to a data-rich, objective scoring tool, their instinct to protect relationships with employees remains a powerful force. Organisations that treat algorithm adoption as a solution to rating inflation, without considering how managers interact with that algorithm, may find they have invested significantly for limited gain.

Building in structured review processes (such as calibration committees or accountability mechanisms) can complement process adjustment rights. The research notes that such controls may amplify the benefits of well-designed algorithmic systems by reinforcing managers’ commitment to accurate evaluation.

Finally, understanding is the key variable (not control). The research showed that neither the amount of time managers spent experimenting with the algorithm nor their abstract sense of autonomy drove greater use. It was specifically their perceived understanding of how the algorithm reasoned. This points to training, interface design, and transparency as the levers organisations should pull, rather than simply expanding or contracting the scope of managerial discretion.

Learn more: How organisational goals can undermine responsible AI implementation

For organisations, the insights are clear, according to Dr Fangbin Lin. “Managers can be given superficial control over algorithmic performance ratings, but organisations may benefit more by instead giving those managers visibility into and control over how algorithms work to evaluate employee performance,” said Dr Lin, who holds a PhD in Accounting from the UNSW Business School.

“Algorithmic tools that expose and enable engagement with an evaluation process are, therefore, critical to narrowing the gap between investing in performance evaluation innovations and actually benefiting from those innovations”. When managers are better enabled to make and stand behind difficult judgments, he noted that underperforming employees are more likely to have access to the resources they need to develop and improve.

Prof. Humphreys added that organisations can also monitor how their managers use algorithms, looking for patterns of selective override rather than assuming adoption equates to impact. “Without this, even well-designed systems risk becoming another layer that appears objective, but quietly reproduces bias in the performance evaluation process,” she concluded.

Republish

You are free to republish this article both online and in print. We ask that you follow some simple guidelines.

Please do not edit the piece, ensure that you attribute the author, their institute, and mention that the article was originally published on Business Think.

By copying the HTML below, you will be adhering to all our guidelines.

Press Ctrl-C to copy