MBRCGI Websites
|
Ibtekr.org
|
MBRCGI.gov.ae
|
UAE Innovates
|
Edge of Government
|
Pitch@Gov

Estonia chooses crowdsourcing as a way to preserve local language

10 minute read
To preserve the linguistic heritage and facilitate the development of speech techniques, The Estonian government is working to create the largest database of spoken language with the participation of volunteers from different community groups.
Share this content

Add to Favorite ♡ 0

To preserve the linguistic heritage and facilitate the development of speech techniques, The Estonian government is working to create the largest database of spoken language with the participation of volunteers from different community groups.

Recent trends may promote that technology is putting popular legacies at risk. But the reality is that technological progress and the preservation of cultural heritage are not parallel lines. And that they will inevitably meet at a certain point, The strategic use of technology will not only preserve underrepresented cultures, It will even help promote them.

Since the emergence of spoken speech analysis technology in smart devices, It saved a lot of time and effort and was popular and used to provide services in the public and private sectors. But creating and running them effectively requires an enormous set of training data to develop algorithms. And when we talk about spoken language, This data is hours and hours of recorded speech. however How to get it? This is the question faced by organizations wishing to create digital solutions to improve their customer experience, While these technologies perform better when trained to handle user voices, Protection and privacy considerations constrain the achievement of this goal, Especially when the implementer is a software developer trying to create a generalizable training model, Many popular voice assistant apps have also recorded cases of gender and racial bias.

In other cases, The problem was due to the low rates of use of native languages, As in the case of Estonia, In the services, information technology and higher education sectors, The reliance on foreign languages and the growing presence of the international workforce have contributed to the decline in the presence of the mother tongue, This prompted the government to launch the "Estonian Language Strategy 2021-2035", To maintain its position in light of the rapid growth of the digital society.

As a pioneer in digitization, Her Ministry of Economic Affairs and Communications collaborated with the Information System Authority to launch the "Donate Your Speech" project for local language crowdsourcing.

In this campaign, The State addresses all adults who speak its language, Whether it is their mother tongue or an acquired language, It invites them to literally donate their words in order to build an extensive database and make it available to government, private and research institutions wishing to develop speech-based services.

Conceptually, Voice crowdsourcing means collecting a large amount of sounds from diverse populations or from different styles of speech. Patterns refer to languages, dialects, or even speech problems that may be common to certain social groups. The technology can also be used to record meetings, convert interviews into transcripts, and create automatic media subtitles.

This campaign benefited from Mozilla's crowdsourcing tool. And through them, It seeks to establish an open database of 4,000 high-quality hours of spoken speech, translated text, and sign language datasets. Open data was chosen to eliminate the need to create separate datasets for each individual project.

To collect this data, The Ministry is preparing a wide advertising campaign that will be broadcast through various media and social media to raise awareness of the importance of language techniques and the preservation of the local language. The technical team has designed a special website that participants can access from any device with audio input such as a personal computer, tablet or smartphone. And talk about any topic of their choice.

Earlier this year, the government launched an app called Porokrat, an AI-powered program that allows people to use voice assistants to access public services. The Public Broadcasting Agency was also able to develop a smart system called "Hans", and replaces the book of shorthand, It converts the content of programs broadcast live on television into brief written texts watched by tens of thousands of people with hearing difficulties. It also records parliamentary conversations in the form of audio files and converts them into written texts, For editors to review before being published on the official website of Parliament.

But crowdsourcing projects usually face several challenges. The first is the quality and accuracy of data, Transcription of audio recordings can cause technical problems and affect the clarity of speech. The second challenge lies in data privacy, Especially since registrations will be available on an open portal. So, After collecting the recordings, Identifying information that may refer to its owners will be deleted, They will still be able to delete their recordings whenever they want.

As for the biggest challenge, It is data bias, Some societal groups will, of course, register less participation. This includes minorities, the elderly and people of determination. Hence, To reach a comprehensive database for all Estonians, Additional efforts must be made to reach out to different population groups and address them with the most appropriate awareness discourse.

Voice crowdsourcing contributes to more diverse data collection, therefore, Develop smarter algorithms. The campaign will also help establish language technologies in information systems used in the public and private sectors. and improve access to services.

Speech recognition software is useful in facilitating the work of security, criminal, judicial, health, research and media agencies. Blogging and detailed reporting are vital necessities.

In the long run, These efforts aim to make voice recognition a positive experience for everyone, regardless of their languages, genders, ages or affiliations.

References:

https://www.hm.ee/sites/default/files/htm_eesti_keele_arengukava_2020_a4_web_en.pdf

https://e-estonia.com/estonian-parliament-uses-speech-recognition-technology-to-create-verbatim-records/

https://annetakonet.ee/projekti-kirjeldus/

https://govinsider.asia/inclusive-gov/estonia-crowdsources-speech-data-for-the-preservation-of-the-estonian-language/

https://thenextweb.com/news/how-mozilla-is-crowdsourcing-speech-to-diversify-voice-recognition

Subscribe to Ibtekr to stay updated on the latest government initiatives, courses, tools and innovations
Register Now
Subscribe to the Ibtekr's mailing list | every week
Innovators Mailing List
We share with more than 20,000 innovators weekly newsletter that monitors global innovations from all over the world
Subscription Form (en)
More from Ibtekr

Promoting Responsible Artificial Intelligence Adoption in Singapore 

Singaporean authorities have revealed a framework and a set of innovative testing tools that assist companies across various sectors in enhancing governance, transparency, and accountability in their artificial intelligence (AI) applications.

 · · 29 January 2024

Lessons in Circular Economy from the Finish Experience 

In the past few years, Finland has become a hub for circular economy. The country aims to curb the use of natural resources by 2035 and has committed to achieving climate neutrality by 2035. The road towards this goal cannot be reached without circular economy. Finland outlined a clear circular economy roadmap guided by supportive […]

 · · 29 January 2024

Cities Employ Data Analytics to Prevent Homelessness

After years of preventing homelessness, some local governments in Britain and America began to tackle the issue from a different angle. Instead of searching for the homeless to take them to shelters, they use modelling and data analytics to predict and assist those at risk of homelessness and help them before they lose their safety.

 · · 29 January 2024

U.S. Adopts Automation to Accelerate Solar Projects 

To accelerate the pace to reach net-zero, the goal of the mid-twenty-first century, the United States government has devised a new way to reduce the burden of bureaucracy on owners of renewable energy projects, solar in particular, through an online platform that examines the project’s plan and automatically grants it legal permit, reducing transaction time […]

 · · 22 January 2024

Digital Transformation of Brazilian Financial Sector Boosts Financial Inclusion 

Brazil is witnessing a revolution in digital finance and an increase in the number of customers joining it, through a comprehensive government-led reform campaign that intensifies the use of technology, focuses on customer needs, renews regulations, develops instant fund transfer applications, and protects its citizens from high interest rates and fraud.

 · · 22 January 2024
1 2 3 78
magnifiercrossmenuchevron-down