Pleias and the GSMA have released CommonLingua, an open-source language identification (LID) model designed for 61 African languages.
Released under the GSMA’s "AI Language Models in Africa, by Africa, for Africa" initiative, the tool addresses a bottleneck in the artificial intelligence (AI) pipeline where global systems fail to identify African text correctly before training begins.
Existing LID systems, such as Meta’s fastText or Google’s GlotLID, often mislabel African content as English or French despite the continent being home to over 2 000 living languages.
CommonLingua aims to resolve this by operating directly on UTF-8 byte sequences, allowing it to process scripts like Ethiopic, N’Ko, and Tifinagh without relying on Western-centric tokenisers.
A recent ITWeb report reveals that 59% of African companies are preparing to spend more than 50 million dollars on AI in 2026. The report also notes that 84% of African CEOs express high optimism for the potential of the technology.
“African languages are not an edge case. They are the working languages of hundreds of millions of people, and they deserve AI infrastructure built with the same care as any other language. CommonLingua is deliberately the first brick we are laying: you cannot curate what you cannot identify,” said Pierre-Carl Langlais, co-founder and chief technology officer, Pleias.
The launch comes as the African AI market shifts toward large-scale execution. While CommonLingua is a specialised tool, it enters an ecosystem where global and local players are attempting to solve the low-resource language challenge.
By providing an open-source foundation, Pleias and the GSMA are enabling a shift toward "compliance-as-a-service" and multilingual customer engagement. These are cited as the next major drivers for regional AI adoption.
Share

