This article was originally published on The Conversation, an independent and nonprofit source of news, analysis and commentary from academic experts. Disclosure information is available on the original site.
___
Author: Anna Luisa Daigneault, PhD Student in Linguistic Anthropology, Universit茅 de Montr茅al
If there are few speakers left of a language, how does a community revive it? In our current era, 3,000 languages are at risk of extinction due to the pressures of colonization, globalization, forced cultural assimilation, environmental devastation and other factors.
According to Canada鈥檚 Commission for Indigenous Languages, 鈥渞esearch shows that no Indigenous language in Canada is safe and that all are in varying stages of endangerment.鈥
Our society is also being shaped by the rapid rise of artificial intelligence. Can AI be used for the benefit of Indigenous language survival in Canada and elsewhere?
According to the World Economic Forum, most AI chatbots are trained on 100 of the world鈥檚 7,000 languages. English is the main driver of most large language models.
This scenario leaves the bulk of the world鈥檚 languages in the dust. In the coming years, will AI contribute to language revitalization, or language oppression?
A language in a box
In a 2023 TEDx talk, Northern Cheyenne computer engineer Michael Running Wolf shared his design of a cedar box that looks both ancient and contemporary. He described the dragonfly-adorned device as a 鈥渃edar-enclosed, offline Edge AI that contains the inner workings of a minimal voice-based language curricula 鈥 in other words, a language in a box.鈥
He proposed that conversational AI technology, much like Amazon Alexa or Google Home, could help language learners improve their fluency.
Running Wolf is the technical director of the First Languages AI Reality initiative at the Qu茅bec Institute for Artificial Intelligence. The program propels Indigenous scholars and technologists towards creating innovative solutions regarding language loss.
Voice-controlled tools trained via machine learning could serve as AI assistants for speakers who wish to hear unfamiliar sounds pronounced accurately, and practice their own pronunciation. This technology could establish a new means for facilitating oral transmission, which is crucial when there are few fluent speakers left.
At the heart of Running Wolf鈥檚 project is Indigenous data sovereignty, which ensures that Indigenous people retain control over their data.
A place in the digital world
Around the world in the Philippines, AI scholar and politician Anna Mae Yu Lamentillo is on a quest to support the Indigenous languages of her home country. She created NightOwlGPT, a new AI-powered translation app.
In an email to me, Lamentillo wrote:
鈥淚n the Philippines alone, we are working on nine languages, many of which are endangered. Our goal is to ensure that these languages 鈥 not just the dominant ones 鈥 have a place in the digital world.鈥
We have seen that in the hands of the powerful, AI software can lead to oppressive forms of control, such as excessive AI-powered surveillance by Amazon and the U.S. government鈥檚 unethical data mining tactics.
When it comes to the survival or extinction of languages, it is important to question the power behind AI tools. Who controls them, and who benefits from them?
When I asked about the democratization of AI, Lamentillo noted the need for inclusivity:
鈥淎I鈥檚 rapid advancement could parallel historical patterns of colonization. If AI is truly a black swan event 鈥 a disruptive moment in history 鈥 then what happens when 99 per cent of languages are left behind? This is more than just a linguistic issue; it鈥檚 a serious matter of accessibility, representation and digital equity. If we don鈥檛 change who is leading AI development, we risk creating a new form of colonization 鈥 one where only a small fraction of the world has the tools to thrive.鈥
Diversity of voices
At a recent workshop series on endangered languages, Emmanuel Ngu茅 Um, a professor of linguistics at the University of Yaound茅 I in Cameroon, spoke on behalf of a research team of African linguists.
They are currently using Mozilla鈥檚 Common Voice platform to create open-source datasets containing thousands of words and audio recordings in 31 African languages.
The platform aims to make speech recognition and voice-based AI more inclusive by crowd-sourcing a massively multilingual speech corpus. But this process is not without significant challenges in Africa.
Ngu茅 Um noted that building datasets for languages with many dialects is not straightforward. There may not be a standardized spelling or pronunciation that should be used by AI as the accepted norms for the language.
Because of postcolonial changes, many African languages do not have one unified or agreed-upon writing system. This issue can slow the creation of teaching tools, but many local efforts backed by UNESCO are underway to change this.
So, how do automatic speech recognition tools deal with dialectical diversity? And how do text-to-speech models handle competing writing systems?
As Ngu茅 Um wrote in an email to me:
鈥淎I has been instrumental in delivering services that applied linguists have promised but are slow to deliver. This is not due to a lack of will or means on the part of linguists, but rather, because of the linguistic reality in Africa. Despite the impact of colonization and the imposition of a monolithic ideal on language reality, Africa reflects the plurality, fluidity and resourcefulness that drive human communication鈥f AI is informed by these intricacies at all phases of its implementation, it will adequately address the diversity of voices鈥n Africa.鈥
It is clear that AI engineers and computational linguists need to integrate thoughtful approaches that take into account unique circumstances of languages.
In the not-too-distant future, using AI tools to learn and communicate in under-resourced languages may become the norm. However, that shift depends on financial backing, accurate training data for machine learning, and community desire to embrace AI. Ultimately, data sovereignty and equitable access must be at the core of AI tools.
___
Anna Luisa Daigneault volunteers for Living Tongues Institute for Endangered Languages, a non-profit organization whose work is not connected to contents of this article.
___
This article is republished from The Conversation under a Creative Commons license. Disclosure information is available on the original site. Read the original article: https://theconversation.com/how-ai-could-help-safeguard-indigenous-languages-255359
Anna Luisa Daigneault, PhD Student in Linguistic Anthropology, Universit茅 de Montr茅al, The Conversation