Although Africa is home to a vast array of languages, many remain absent from the burgeoning field of artificial intelligence (AI) development. The growth of AI tools like ChatGPT has largely favored languages with extensive written resources, such as English, European languages, and Chinese, while African languages, predominantly spoken and less documented, have been left behind. This linguistic gap excludes millions of Africans from the benefits of AI.

To tackle this issue, researchers have recently unveiled what may be the largest dataset of African languages aimed at training AI. Prof. Vukosi Marivate from the University of Pretoria emphasizes that understanding and interpreting the world in one's native language is crucial, and when technology fails to reflect this reality, entire communities risk being marginalized.

The African Next Voices initiative has gathered linguists and computer scientists to create datasets for 18 African languages, encompassing languages like Kikuyu, Hausa, isiZulu, and more. Although a small fraction of the continent's countless languages, these efforts mark a significant step towards inclusivity in AI. Over two years, the project has amassed over 9,000 hours of speech data across various contexts, from agriculture to healthcare.

Kelebogile Mosime, a farmer in South Africa, utilizes an AI application that communicates in her native language, Setswana, to address challenges on her farm, highlighting the practical advantages of technology that resonates with local users. Similarly, the South African startup Lelapa AI is developing AI solutions tailored for banks and telecoms that cater to non-English speakers, recognizing language as a barrier to essential services.

Prof. Marivate warns that the neglect of indigenous languages not only restricts access to technology but threatens to erase cultural narratives and perspectives that shape identities. The ongoing journey to close the AI language gap seeks to ensure that advancements in technology can reflect and cater to the diverse linguistic tapestry of Africa.