The best way to develop and intellectualise African languages in our universities is by using technology, says Professor Nobuhle Hlongwa, Chairperson of the Community of Practice for African Languages (CoPAL), one of 11 active communities of practice at Universities South Africa (USAf).
She was speaking at the group’s second meeting of the year, that was hosted from the Pretoria West campus of the Tshwane University of Technology (TUT) on 20 June 2023.
Hlongwa is also Dean and Head of the School of Arts at the University of KwaZulu-Natal. In fact, most CoPAL members are in humanities, traditionally the institutional home of languages. This made Dr Etienne van Wyk’s address at the CoPAL meeting stand out even more.
“If you’re wondering why a computer scientist is addressing you today,” said Dr van Wyk (left), ‘’I got this assignment because I expressed an interest”. Van Wyk is TUT’s Executive Dean of the Faculty of Information and Communications Technology (ICT) and rector of its Soshanguve campus. He explained how he was keen to assist TUT implement its new language policy from a technological perspective.
The policy relates to the government’s Language Policy Framework for Public Higher Education Institutions, promulgated in 2020, which aims to enhance the status and role of previously marginalised local languages in universities.
Van Wyk said the timing of the CoPAL meeting was perfect because TUT was about to start implementing its policy after a lengthy review process. “It went through all their washing machines and approval processes until it was approved by Council late last year,’’ he said.
Van Wyk, who was on the task team that developed TUT’s policy, said his interest in the policy – and so by implication in the work of CoPAL, which has been spearheading the development of institutional policies and the implementation of the Language Policy Framework – was sparked by three triggers. These were the value of ICT at TUT, an audit where he could not provide a positive answer to any of the questions, and the huge interest in ChatGPT.
ICT has a dedicated position at TUT
TUT is the only South African university with an ICT faculty. This puts the institution in the singular position to do research and innovation in many areas within the discipline of using digital technologies. One flagship enterprise is in digital agriculture, involving 22 projects using artificial intelligence (AI) on a smart farm.
“We’re not just an academic faculty; we are very much involved in innovation and applying our technologies. And one of the areas we have identified to work in is within the domain of languages,” said Dr van Wyk.
TUT had an agreement with the University of Trento in Italy about developing databases of African indigenous languages. This links to the PhD research of Mr Dan Masethe, section head in the department of computer science, who was also present at the meeting.
Masethe is working on resolving lexical ambiguity in Sesotho sa Leboa, or Sepedi, which is about words with multiple meanings. “We’re looking forward to him completing his doctorate, which will allow us to do all sorts of ICT things on that lecxicon database,” said Dr van Wyk.
SADiLaR’s audit exposed shortfalls
The second trigger to Van Wyk’s interest in the development of marginalised African languages was an event that had taken place in the very same venue as the CoPAL meeting on the TUT campus. The South African Centre for Digital Language Resources, (SADiLaR) had visited TUT as part of its nationwide audit on universities’ resources that could support implementation of the language policy. SADiLaR is a Department of Science and Innovation (DSI) initiative based at North-West University and headed by Professor Langa Khumalo, the former chairperson of CoPAL.
‘’This audit was significant for us as the people’s university, particularly as a university teaching the following six indigenous languages spoken in South Africa: in alphabetical order, Afrikaans, isiZulu, Sepedi, Setswana, Sivenda, Xitsonga, as well as South African sign language.”
But the other significance of the audit was Van Wyk being asked questions about whether they had specific ICT tools in place, “and if we don’t have them, why not? Because we are the ICT faculty and we build things like that. Since then, I’ve been trying to get the questions again from SADiLaR because I wasn’t focusing on exactly what I was saying at the time. I was just saying that we didn’t have the tools’’.
He said the ICT Faculty was committed to assisting TUT with implementing the language policy. “And in doing that, I will actually be assisting CoPAL with their objectives,” said Dr van Wyk.
The recent explosion of interest in ChatGBT
The third trigger in Van Wyk’s interest in African languages stems from large language models (LLMs) such as ChatGBT.
“These are just computational models that use natural language processing systems with billions of parameters. And they have shown the capabilities to generate creative text, to solve mathematical theorems, to predict protein structures, to answer reading comprehension questions,” he said.
“The development of LLM’s is one of the clearest cases of the substantial potential benefits AI can offer at scale to billions of people,” said Van Wyk. Despite the threats some feel it poses to jobs, Van Wyk said access to LLMs remains limited because of the resources required to train and run such large models.
‘’This restricted access has limited researchers’ ability to understand how and why these large language models work, hindering the progress on efforts to improve their robustness and mitigate known issues such as bias, toxicity, and the potential for generating misinformation,” he said.
Teaching in different languages
One of the objectives of TUT’s language policy, said Van Wyk, is that each campus has a specific language they wish to develop as an instructional language. There are already many voice cloning AI apps available that, when given a sample of a voice, can use their training from large datasets of speech to generate the audio in a different language in real time.
One of the superior voice replicators is Microsoft’s VALL-E. The X version of it can clone a voice from a sample of four to 10 seconds and then use it to synthesise speech in a different language, while preserving the original speaker’s voice, emotion and tone. “Which means I can be speaking to you now expressing this tone, and emotion, and you can be sitting there listening to me in another language, any language in the world, using this voice clone software. It shouldn’t be a challenge anymore for me, in Soshanguve, to teach whoever I can and speak English, even Afrikaans, and they’ll be hearing it in the language of their choice with the same voice, emotion and tone. We shouldn’t have language barriers anymore, if we are successful in developing these tools,” said Dr van Wyk.
His faculty is serious about making an impact in the field of computational linguistics and natural language processing. They will be forming a group, with Masethe as its project champion, to ensure it can use technology effectively in implementing TUT’s language policy.
Addressing the 40 delegates from universities throughout the country, he said: ‘’May this gathering be an inspiring experience, fueling our commitment to the preservation, revitalisation and growth of African languages. Together, let us embark on this shared journey with enthusiasm, dedication, and a deep sense of purpose.”
Simultaneous translation will help African students
Professor Hlongwa said Van Wyk’s presentation dealt with what CoPAL and many universities are trying to do. Most tertiary students are African and if they could be educated in their home languages, their performance would improve, as would the throughput rates.
She said they were constantly being told at conferences of the African Language Association of Southern Africa (ALASA) that “this dream of human language technologies” needs a lot of spoken and written data for computers to generate the type of software Van Wyk had been speaking of.
She said she hoped Dr van Wyk would be attending the next ALASA conference, to be held at TUT from 26 to 29 September. “We will be happy to hear more on what he has presented,” said Professor Hlongwa.
Gillian Anstey is a contract writer for Universities South Africa