Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems

de Wet, Febe; Kleynhans, Neil; van Compernolle, Dirk; Sahraeian, Reza

doi:10.17159/sajs.2017/20160038

Services on Demand

Article

Automatic translation

Indicators

Access statistics

South African Journal of Science

On-line version ISSN 1996-7489
Print version ISSN 0038-2353

Abstract

DE WET, Febe; KLEYNHANS, Neil; VAN COMPERNOLLE, Dirk and SAHRAEIAN, Reza. Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems. S. Afr. j. sci. [online]. 2017, vol.113, n.1-2, pp.1-9. ISSN 1996-7489. http://dx.doi.org/10.17159/sajs.2017/20160038.

For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources can potentially be used to increase the recognition accuracy of speech systems developed in the resource poor target language. The assumption is that adding more data will increase the robustness of the statistical estimations captured by the acoustic models. In this study we investigated data sharing between Afrikaans and Flemish - an under-resourced and well-resourced language, respectively. Our approach was focused on the exploration of model adaptation and refinement techniques associated with hidden Markov model based speech recognition systems to improve the benefit of sharing data. Specifically, we focused on the use of currently available techniques, some possible combinations and the exact utilisation of the techniques during the acoustic model development process. Our findings show that simply using normal approaches to adaptation and refinement does not result in any benefits when adding Flemish data to the Afrikaans training pool. The only observed improvement was achieved when developing acoustic models on all available data but estimating model refinements and adaptations on the target data only. SIGNIFICANCE: • Acoustic modelling for under-resourced languages • Automatic speech recognition for Afrikaans • Data sharing between Flemish and Afrikaans to improve acoustic modelling for Afrikaans

Keywords : acoustic modelling; Afrikaans; Flemish; automatic speech recognition.

· text in English · English (

pdf )

Services on Demand

Article

Indicators

Related links

Share

South African Journal of Science

On-line version ISSN 1996-7489
Print version ISSN 0038-2353

Abstract

Services on Demand

Article

Indicators

Related links

Share

South African Journal of Science

On-line version ISSN 1996-7489Print version ISSN 0038-2353

Abstract

On-line version ISSN 1996-7489
Print version ISSN 0038-2353