The challenges of building AI in Africa
As machine learning and AI play ever larger roles in our lives, what foundations are being laid in Africa?
Vukosi Marivate, the Absa data science chair at the University of Pretoria, is well-placed to answer to this question, and has been instrumental in setting up a number of initiatives. One of these is the Deep Learning Indaba, of which he was a co-founder, and it is now the largest AI/ML workshop on the continent. Another is the Masakhane natural language processing project, which aims to strengthen research in African languages.
But as he told the audience at ITWeb Business Intelligence Summit in Sandton this week, there is a dearth of R&D funds, resources and data for AI and ML on the continent.
By way of an example he quoted recent figures from Wikipedia, which showed that there were over six million English entries, and around 90 000 in Afrikaans. In sharp contrast, there were 8 000 entries in Sepedi. Tshivenda is even less represented, with 367 entries, and there are none at all for Ndebele, which he said was the tongue of wife.
“When you think about machine learning and AI, you need data. You’re not going to have an Ndebele Google Assistant at the moment. That data doesn’t exist. The knowledge graphs for all these (AI) systems use Wikipedia itself or Wikidata or a combination of the two.”
“Language captures indigenous knowledge, and then culture. We have colonial legacies in terms of the ways that languages have developed.”
Data colonialism
It’s also important to not just import the technology into a country.
“If you’re just buying the technology you’re not building the skills. It’s becoming a tool of control and exploitation. If you just become a source of data, then you’re not controlling it. People are just extracting (data) and not really giving back,” he said, adding that in the AI community there were now more discussions around data colonialism.
He said that generally, government policy trailed development of technologies. In South Africa, for instance, it was not clear where the responsibility lies in the deployment of facial recognition.
“The people building this live in cities like San Francisco where these technologies are banned. The engineers and research scientists who are building these models live in a city that doesn’t allow these models. They’re protected, but their clients aren’t.”
Marivate is concerned with building machine learning and data science communities, and it’s here that he has been very effective.
He remembered holding the inaugural Deep Learning Indaba in Johannesburg in 2017, and hundreds of young people applied. In 2019, the Masakhane project followed, which in turn saw other initiatives take root around the continent.
“Communities can take these technologies and shape them and build them for themselves. We’re doing research on African languages, for Africans, by Africans.”