The human in the loop – or not

Neville Hobson 18 May 2026 4 min read

There is a phrase that has been doing a lot of work lately. You hear it in boardrooms and conference keynotes, in policy papers and press releases. Human in the loop. It sounds reassuring. It implies oversight, judgement, and accountability.

It suggests that, however sophisticated the AI, there is still a person – a thinking, responsible, experienced person – making sure things are right before they go out into the world.

I am beginning to wonder, though, whether that phrase has become a kind of comfort blanket. Something organisations say rather than something they do.

EY has become the latest professional services firm to discover, publicly and painfully, what happens when the loop turns out to be empty.

Last week, the Financial Times reported that EY's Canadian operation had published a study on cybersecurity risks in loyalty reward programmes – Points of Attack: Uncovering Cyber Threats and Fraud in Loyalty Systems – which was being used to market their services.

Researchers at AI detection platform GPTZero examined it and found what should have been impossible to miss: contradictory market figures, citations pointing to pages that didn't exist, and a reference to a McKinsey report that, as far as anyone can tell, has never existed.

More than half the footnotes were, in some form, fabricated, broken or misattributed – a huge case of widespread hallucinated citations known as "vibe citing." (Get used to that phrase.)

EY removed the study from its website and issued a statement. The firm "takes the accuracy of all the content we publish seriously," it said, and has "an organisation-wide commitment to the responsible use of AI." That commitment, apparently, did not extend to clicking the footnotes.

When the loop is empty

I want to be careful here, because it would be easy – and not entirely accurate – to make this simply about EY. Last December, I wrote about a similar failure at Deloitte, in which reports for the Australian government and a Canadian provincial government were found to contain fake citations. In April, law firm Sullivan & Cromwell apologised to a New York court after a filing repeatedly misquoted the US bankruptcy code and cited cases that didn't hold up.

Three separate organisations, three separate contexts, the same essential failure. At some point, the pattern becomes the story.

And the pattern is not really about AI hallucinations. That framing lets everyone off too lightly. The tools behaved more or less as the tools behave – generating plausible-sounding content, including plausible-sounding citations, which is something large language models are known to do. The failure was not in the generation. It was in what happened – or didn't happen – afterwards.

This is where I keep returning to that phrase. Human in the loop. What does it actually require?

It requires, at a minimum, that someone reads the output. Not skims it – reads it. It requires that they check all the references, whether that's five or fifty-five, and not simply assume them. It requires that they have both the time and the authority to push back, to say this doesn't look right, to delay publication until it does. And it requires a culture in which rigour is valued over speed, and in which the pressure to demonstrate AI productivity does not quietly override the discipline of verification.

EY, it's worth noting, had publicly boasted that its AI-related revenue had grown 30 per cent in the previous year, and that 15,000 staff had worked on AI client projects.

AI adoption, in other words, was something the firm measured and celebrated. I think it's worth asking what internal environment that kind of metric creates – and whether "responsible use of AI" sits comfortably alongside targets that reward adoption above all else.

Three researchers, three hours, one awkward irony

There is a detail in this story that I keep coming back to. The researchers at GPTZero – three of them – identified the problems and wrote them up. They did, in a relatively short time, what EY's own processes apparently never did. They read it. They clicked the links. They noticed that the loyalty scheme market and the unclaimed loyalty points were both estimated at exactly $200 billion, in the same report, which is the kind of thing that jumps out when you are actually paying attention.

And there is a small irony in the subject matter that is hard to ignore. The report was about cybersecurity – a discipline built on the principle that you do not trust without verifying. You assume the threat is real until you can demonstrate otherwise. You check. You question. You do not take the output at face value.

Someone at EY Canada apparently decided that principle applied to loyalty scheme fraud, but not to their own work.

What it actually takes

I use AI tools. I use them regularly, and I find them genuinely useful. I also know from experience that they require active engagement. You cannot treat the output as finished. You have to read it as if you might be wrong, check the things that can be checked, and be prepared to go back and ask again. That takes time. It takes judgement. It requires knowing enough about the subject to recognise when something doesn't add up.

Which brings me to the question I keep asking myself, not entirely rhetorically: what will it take?

Not what technology will fix this – I don't think that's where the answer lies. Better tools won't solve a human accountability problem.

What will it take for organisations to be honest with themselves about what "human in the loop" actually demands? About the conditions – the time, the expertise, the culture – that make genuine oversight possible? About the gap between what they say in press statements and what their internal incentives actually reward?

I don't have a tidy answer. I'm not sure one exists. But I think the question is overdue, and I notice that EY, Deloitte and Sullivan & Cromwell have all, in their different ways, just made the case for asking it again.

Sources: