Improving Online Translation Tools with AI for the Benefit of Navajo and Other Indigenous Languages

Improving Online Translation Tools with AI for the Benefit of Navajo and Other Indigenous Languages

Connecting the Digital Divide: AI Innovation Brings Promise for Navajo and Other Vulnerable Indigenous Languages

While contemporary translation applications like Google Translate cater to over 100 languages, they often face challenges—or completely miss the mark—when it comes to Indigenous languages like Navajo. Although it ranks as the most commonly spoken Native American language, Navajo is frequently misidentified by prevalent language technologies. Yet, this situation may be on the verge of transformation. Fresh research from Dartmouth College illustrates how artificial intelligence (AI) can accurately recognize Navajo text with nearly flawless precision, setting the stage for enhanced acknowledgment and support of endangered Indigenous languages in the digital sphere.

Presented at the 2025 Association for Computational Linguistics conference in Albuquerque, the Dartmouth study indicates that even with minimal resources, AI frameworks can be trained to classify Navajo with 97-100% accuracy. This advancement carries significant implications for safeguarding Indigenous linguistic heritage—and could play a role in revitalizing cultures at risk of vanishing.

The Challenges with Existing Language Tools

Though AI-oriented language services have advanced quickly in recent years, their functionalities often reflect inherent biases. Languages enjoying extensive digital representation—such as English, Spanish, or Mandarin—gain from deeper AI integration, while those with limited written material or smaller speaking populations are overlooked. This disparity was starkly demonstrated when Dartmouth researchers evaluated Google’s Language Identification tool (LangID): Navajo phrases were mistakenly categorized as Icelandic, Lingala, or even Wolof.

“By building upon the principles behind LangID, we discovered that creating a classifier tailored to Indigenous languages is entirely achievable,” stated Ivory Yang, the lead author of the study and a PhD candidate at Dartmouth. The research team trained an AI model using a dataset of 10,000 well-sourced Navajo sentences. The results showcased remarkable accuracy, underscoring how even modest investments in data collection and machine learning can lead to significant outcomes.

Connecting to Related Languages

In addition to Navajo, the team examined the model’s capacity to generalize to other Athabaskan languages, to which Navajo is related. This language family encompasses several languages spoken by Apache nations and Native Alaskans.

Even with extremely constrained datasets—as few as 20 sentences—the model was able to identify related languages like Western Apache, Mescalero Apache, and Jicarilla Apache as akin to Navajo. While not a flawless identification, this cross-linguistic recognition is what Yang and colleagues refer to as a “bridge” effect, where high-resource languages act as templates for forming basic models for their lower-resource counterparts.

The potential ramifications are considerable. With conservation initiatives frequently stymied by a shortage of speakers and written resources, this approach could empower AI tools to recognize and ultimately translate not only Navajo but even rarer dialects that lack substantial digital content.

Transitioning from Identification to Translation

While language identification serves as a vital initial step, the Dartmouth team perceives it as the groundwork for more sophisticated features. After a language is accurately identified, subsequent tools for text translation, speech recognition, language learning applications, and educational platforms can be developed.

Dr. Soroush Vosoughi from Dartmouth, co-author of the paper and assistant professor of computer science, underscores the cultural significance of visibility: “Many Indigenous languages lack the fundamental dignity of being acknowledged online, reflecting systemic bias in language technology. Revitalization starts with visibility, and visibility starts with identification.”

The team envisions a multi-layered plan for supporting Indigenous languages that encompasses:

  • Expanding AI models to include more Native American languages
  • Creating machine translation capabilities specifically for Navajo and Athabaskan dialects
  • Formulating educational tools to facilitate language learning and intergenerational teaching
  • Partnering with Indigenous communities to ensure that technology honors cultural values and norms

Yang remarks that while comprehensive translation tools are a long-term objective due to the structural intricacies of Indigenous languages, the research signifies a solid foundation. “At this stage, we are constructing the framework. Translation is significantly more challenging—but it is now a tangible possibility.”

Laying the Foundation for Language Revitalization

This AI-focused initiative is part of a larger effort at Dartmouth dedicated to preserving endangered languages through technological means. The team previously developed NüshuRescue, which facilitates machine translation from Chinese to Nüshu—an ancient script created and transmitted by women in southern China. These endeavors illustrate a growing acknowledgment among researchers that, when thoughtfully deployed, AI-driven tools can be leveraged to safeguard linguistic diversity.

Having Navajo and other Indigenous languages represented online is not merely about convenience or novelty—it embodies autonomy, empowerment, and identity. For the more than 350,000 Navajo speakers, along with other Athabaskan language communities, integration into widely used platforms validates their cultural heritage and may encourage younger generations to embrace and communicate in their ancestral language.

Partnerships with Communities Are Essential

As AI continues to influence our digital experiences, the inclusion of endangered languages must be approached with caution and collaboration—rather than mere automation. Indigenous knowledge holders, educators, and linguists must be integral to this process.