SciELO - Scientific Electronic Library Online

 
vol.38 issue1 author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Article

Indicators

Related links

  • On index processCited by Google
  • On index processSimilars in Google

Share


Literator (Potchefstroom. Online)

On-line version ISSN 2219-8237
Print version ISSN 0258-2279

Abstract

BOSCH, Sonja E.  and  GRIESEL, Marissa. Strategies for building wordnets for under-resourced languages: The case of African languages. Literator [online]. 2017, vol.38, n.1, pp.1-12. ISSN 2219-8237.  http://dx.doi.org/10.4102/lit.v38i1.1351.

The African Wordnet Project (AWN) aims at building wordnets for five African languages: Setswana, isiXhosa, isiZulu, Sesotho sa Leboa (also referred to as Sepedi or Northern Sotho) and Tshivenda. Currently, the so-called expand model, based on the structure of the English Princeton WordNet (PWN), is used to continually develop the African Wordnets manually. This is a labour-intensive work that needs to be performed by linguistic experts, guided by several considerations such as the level of lexicalisation of a term in the African language. Up to now, linguists were responsible for identifying and translating appropriate synsets without much help from electronic resources because in the case of African languages even basic resources such as computer readable and electronic bilingual wordlists are usually not freely available. Methods to speed up the manual development of synsets and ease the workload of the human language experts were recently investigated. These centred around utilising the minimal amount of information available in bilingual dictionaries to identify synsets in the PWN that should be included in the AWN, transferring information from dictionaries to the wordnet and presenting the potential synsets to linguists for final approval and inclusion in the wordnets. In this article, we describe the methodology developed for building the African Wordnets, a potentially significant resource for natural language processing applications. Available resources that could be taken advantage of and resources that had to be developed are investigated, and initial results and future plans are explained.

        · abstract in Afrikaans     · text in English     · English ( pdf )

 

Creative Commons License All the contents of this journal, except where otherwise noted, is licensed under a Creative Commons Attribution License