Web Analytics

Whose English Does AI Speak?

AI tools are trained on English, but not the kind most people speak. This post explores how generative AI reinforces American norms, often at the expense of global linguistic diversity. So, whose English is it really speaking?

Whose English Does AI Speak?
AI large language models by Wes Cockx & Google DeepMind / https://betterimagesofai.org / CC-BY 4.0

In April, I wrote about the divergence between UK and US English, pondering whether the growing dominance of American spelling, syntax and usage online might be steering us towards a homogenised "International English."

I suggested that English, though shared, is fracturing in practice while simultaneously being compressed into a singular, tech-favoured form.

But that earlier post only scratched the surface.

An article last week in The Conversation opened up the next layer of this topic relating to usage in generative artificial intelligence, raising a far more pressing question: not just how English is changing, but whose English AI systems are actually built on.

The answer, of course, is almost entirely mainstream American English. This isn't just a linguistic curiosity – it's a design choice with profound implications.

The American English default

Roughly 90% of the training data used in today's generative AI systems is in English, The Conversation's report notes. That alone is striking, but what’s more telling is which English is privileged.

As the article makes clear, Silicon Valley's giants – Google, OpenAI, Meta, Microsoft – have trained their systems on data dominated by US-based media, forums and online content.

Unsurprisingly, this means that what AI models learn to understand and replicate is not the rich diversity of Englishes spoken around the world, but a polished, standardised, American version of it.

This has real-world consequences.

When an AI writing assistant "corrects" your spelling or flags your grammar, it's likely using American norms as its benchmark. When speech recognition software struggles to understand Indian, Nigerian or Aboriginal English accents and syntax, it’s because those patterns were under-represented – or absent – in the training data.

The risk is that entire communities are being misunderstood, misrepresented or outright excluded from digital experiences.

As The Conversation piece points out, linguistic diversity is often treated as noise in AI systems – something to clean up or normalise – rather than a valid signal of cultural richness.

But English isn’t – and has never been – a monolith.

From Indian English’s inventive lexical contributions like “prepone,” to the unique syntactic rhythms of Aboriginal English, these are legitimate, fully-formed systems shaped by culture, history and necessity.

The Conversation's article makes a compelling call for linguistic justice in AI: systems that adapt to users, not the other way around. That means involving linguists, educators and communities in building models that reflect the full spectrum of how English is spoken, not just how it's written in the New York Times or spoken in Silicon Valley.

Reclaiming English in AI

This mirrors the core idea in my April post: that English is constantly evolving. I explored this through a cultural and social lens, focused on the findings from research by YouGov.

This follow-up sharpens the focus on technology's role in that evolution – and the danger of allowing convenience and corporate dominance to define what "correct English" means for billions of users.

The path forward lies in building tools that reflect how people actually speak, not how Silicon Valley expects them to.

So the next time a chatbot misunderstands your phrasing or a grammar checker offers a suspiciously American correction, ask yourself: whose English is this AI speaking? And more importantly: whose isn’t it?