MBRCGI Websites
|
Ibtekr.org
|
MBRCGI.gov.ae
|
UAE Innovates
|
Edge of Government
|
Pitch@Gov

Estonia chooses crowdsourcing as a way to preserve local language

10 minute read
To preserve the linguistic heritage and facilitate the development of speech techniques, The Estonian government is working to create the largest database of spoken language with the participation of volunteers from different community groups.
Share this content

Add to Favorite ♡ 0

To preserve the linguistic heritage and facilitate the development of speech techniques, The Estonian government is working to create the largest database of spoken language with the participation of volunteers from different community groups.

Recent trends may promote that technology is putting popular legacies at risk. But the reality is that technological progress and the preservation of cultural heritage are not parallel lines. And that they will inevitably meet at a certain point, The strategic use of technology will not only preserve underrepresented cultures, It will even help promote them.

Since the emergence of spoken speech analysis technology in smart devices, It saved a lot of time and effort and was popular and used to provide services in the public and private sectors. But creating and running them effectively requires an enormous set of training data to develop algorithms. And when we talk about spoken language, This data is hours and hours of recorded speech. however How to get it? This is the question faced by organizations wishing to create digital solutions to improve their customer experience, While these technologies perform better when trained to handle user voices, Protection and privacy considerations constrain the achievement of this goal, Especially when the implementer is a software developer trying to create a generalizable training model, Many popular voice assistant apps have also recorded cases of gender and racial bias.

In other cases, The problem was due to the low rates of use of native languages, As in the case of Estonia, In the services, information technology and higher education sectors, The reliance on foreign languages and the growing presence of the international workforce have contributed to the decline in the presence of the mother tongue, This prompted the government to launch the "Estonian Language Strategy 2021-2035", To maintain its position in light of the rapid growth of the digital society.

As a pioneer in digitization, Her Ministry of Economic Affairs and Communications collaborated with the Information System Authority to launch the "Donate Your Speech" project for local language crowdsourcing.

In this campaign, The State addresses all adults who speak its language, Whether it is their mother tongue or an acquired language, It invites them to literally donate their words in order to build an extensive database and make it available to government, private and research institutions wishing to develop speech-based services.

Conceptually, Voice crowdsourcing means collecting a large amount of sounds from diverse populations or from different styles of speech. Patterns refer to languages, dialects, or even speech problems that may be common to certain social groups. The technology can also be used to record meetings, convert interviews into transcripts, and create automatic media subtitles.

This campaign benefited from Mozilla's crowdsourcing tool. And through them, It seeks to establish an open database of 4,000 high-quality hours of spoken speech, translated text, and sign language datasets. Open data was chosen to eliminate the need to create separate datasets for each individual project.

To collect this data, The Ministry is preparing a wide advertising campaign that will be broadcast through various media and social media to raise awareness of the importance of language techniques and the preservation of the local language. The technical team has designed a special website that participants can access from any device with audio input such as a personal computer, tablet or smartphone. And talk about any topic of their choice.

Earlier this year, the government launched an app called Porokrat, an AI-powered program that allows people to use voice assistants to access public services. The Public Broadcasting Agency was also able to develop a smart system called "Hans", and replaces the book of shorthand, It converts the content of programs broadcast live on television into brief written texts watched by tens of thousands of people with hearing difficulties. It also records parliamentary conversations in the form of audio files and converts them into written texts, For editors to review before being published on the official website of Parliament.

But crowdsourcing projects usually face several challenges. The first is the quality and accuracy of data, Transcription of audio recordings can cause technical problems and affect the clarity of speech. The second challenge lies in data privacy, Especially since registrations will be available on an open portal. So, After collecting the recordings, Identifying information that may refer to its owners will be deleted, They will still be able to delete their recordings whenever they want.

As for the biggest challenge, It is data bias, Some societal groups will, of course, register less participation. This includes minorities, the elderly and people of determination. Hence, To reach a comprehensive database for all Estonians, Additional efforts must be made to reach out to different population groups and address them with the most appropriate awareness discourse.

Voice crowdsourcing contributes to more diverse data collection, therefore, Develop smarter algorithms. The campaign will also help establish language technologies in information systems used in the public and private sectors. and improve access to services.

Speech recognition software is useful in facilitating the work of security, criminal, judicial, health, research and media agencies. Blogging and detailed reporting are vital necessities.

In the long run, These efforts aim to make voice recognition a positive experience for everyone, regardless of their languages, genders, ages or affiliations.

References:

https://www.hm.ee/sites/default/files/htm_eesti_keele_arengukava_2020_a4_web_en.pdf

https://e-estonia.com/estonian-parliament-uses-speech-recognition-technology-to-create-verbatim-records/

https://annetakonet.ee/projekti-kirjeldus/

https://govinsider.asia/inclusive-gov/estonia-crowdsources-speech-data-for-the-preservation-of-the-estonian-language/

https://thenextweb.com/news/how-mozilla-is-crowdsourcing-speech-to-diversify-voice-recognition

Subscribe to Ibtekr to stay updated on the latest government initiatives, courses, tools and innovations
Register Now
Subscribe to Ibtekr’s Newsletter
Innovators’ Mailing List
Our newsletter reaches more than 30,000 innovators from around the world! Stay up to date with innovations from across fields of practice in the public sector.
Subscription Form (en)
More from Ibtekr

From Informal to Integrated: Jakarta's Microbus Revolution

Jakarta has taken an innovative approach to urban transportation by integrating diverse transport modes and streamlining operations. This holistic strategy has resulted in a substantial increase in ridership within a short timeframe, providing insights into transport solutions for other cities facing similar challenges.

 · · 18 September 2024

From Poverty to Progress: Mexico City's Digital Transformation for Social Impact

Despite being the capital of a major global economy and boasting a rich history, Mexico City grapples with significant challenges, including poverty and overpopulation. Amid entrenched bureaucracy and social inequality, solutions to these problems have proven elusive. However, signs of progress, albeit slow, are emerging, as the Digital Agency for Public Innovation coordinates government efforts and technological innovation to address complex urban problems. By leveraging digital tools and fostering cross-departmental collaboration, Mexico City is demonstrating that it is possible to improve the lives of its citizens even in the face of daunting obstacles.

 · · 18 September 2024

Singapore's Green ICT Initiative: A Sustainable Future for Technology

While the IT sector's direct contribution to climate change is currently relatively small compared to sectors like transportation and industry, its growing footprint and increasing demand have led to concerns about its future impact. Estimates suggest that the IT sector, which currently contributes between 1.8% and 4% of global greenhouse gas emissions, could account for as much as 14% by 2040. This looming environmental challenge has prompted a global response.

 · · 18 September 2024

Innovative Use of Data Supports Public Health Policy in Canada

The COVID-19 pandemic posed an unprecedented global challenge, exposing vulnerabilities in policymaking and healthcare systems that led to shortcomings in crisis management. However, some countries resorted to innovative approaches that helped mitigate the impact of the crisis and proved their worth in becoming part of mechanisms to support preparedness for future health emergencies. Canada was one such country, embracing big data to track population movement to inform decision-making in the face of COVID-19.

 · · 22 August 2024

Managing Water Crisis: Lessons from Cape Town's Drought Experience

Extensive efforts are made by big cities to plan and invest in ensuring the provision of the most crucial natural resource, which is water, especially in the face of climate change challenges that can adversely affect the availability of water in urban areas in two ways. It exacerbates water scarcity and contributes to accelerating population growth in cities due to the increasing migration of rural residents to cities, as environmental conditions suitable for agriculture decline in certain areas. In this context, Cape Town stands as a prominent case worthy of study in facing this challenge.

 · · 22 August 2024
1 2 3 83
magnifiercrossmenuchevron-down