SciELO - Scientific Electronic Library Online

 
vol.35An Overview of English Dictionaries of AbbreviationsTowards Accuracy: A Model for the Analysis of Typographical Errors in Specialised Bilingual Dictionaries. Two Case Studies author indexsubject indexarticles search
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

    Related links

    • On index processCited by Google
    • On index processSimilars in Google

    Share


    Lexikos

    On-line version ISSN 2224-0039Print version ISSN 1684-4904

    Abstract

    NAM, Kilim; LEE, Soojin  and  JUNG, Hae-Yun. Detection and Description of Neologisms in Korean Lexicography: Methodological Issues in Corpus Balance, Word Unit Bias and LLM Assistance. Lexikos [online]. 2025, vol.35, pp.414-438. ISSN 2224-0039.  https://doi.org/10.5788/35-1-2045.

    This study explores the potential application of large language models (LLMs) in Korean neologism extraction and dictionary compilation while critically examining the limitations of existing methods, including the bias toward news-oriented data and morphological neologisms. By analysing data from news corpora alongside messenger and online post corpora, the study identifies significant limitations in current news-centred approaches, particularly in detecting the first occurrences and extracting neologisms related to everyday topics. Experimental results involving LLMs demonstrate their potential to address the limitations of news-biased neologism extraction by suggesting unregistered words from diverse web-based contexts. However, issues such as duplication and overgeneration persist. In tasks involving semantic neologism recommendation and dictionary microstructure creation, LLMs performed relatively well with high-frequency and news-biased topics when provided with additional contextual prompts, yet revealed limitations with low-frequency and non-news-biased neologisms. These findings suggest that the performance of current LLMs heavily relies on the diversity of training data and user-provided contextual information. The results of this study underscore the need for further investigation into the critical challenges in neologism research, lexicography, and corpus linguistics, as well as the role lexicography might play in enhancing the performance of LLMs.

    Keywords : lexicography; neologisms; unregistered words; news corpus; semantic neologism; representativeness; balance; lexicographic data; macro-structure; large language models.

            · abstract in Afrikaans     · text in English     · English ( pdf )