Detection and Description of Neologisms in Korean Lexicography: Methodological Issues in Corpus Balance, Word Unit Bias and LLM Assistance

  • Kilim Nam Department of Korean Language and Literature, Yonsei University, Seoul, South Korea (https://orcid.org/0009-0001-9358-2673)
  • Soojin Lee International Exchange Department, Kyungpook National University, Daegu, South Korea (https://orcid.org/0009-0002-1720-1880)
  • Hae-Yun Jung International Exchange Department, Kyungpook National University, Daegu, South Korea (https://orcid.org/0009-0006-4837-2569)

Abstract

This study explores the potential application of large language models (LLMs) in Korean neologism extraction and dictionary compilation while critically examining the limitations of existing methods, including the bias toward news-oriented data and morphological neologisms. By analysing data from news corpora alongside messenger and online post corpora, the study identifies significant limitations in current news-centred approaches, particularly in detecting the first occurrences and extracting neologisms related to everyday topics. Experimental results involving LLMs demonstrate their potential to address the limitations of news-biased neologism extraction by suggesting unregistered words from diverse web-based contexts. However, issues such as duplication and overgeneration persist. In tasks involving semantic neologism recommendation and dictionary microstructure creation, LLMs performed relatively well with high-frequency and news-biased topics when provided with additional contextual prompts, yet revealed limitations with low-frequency and non-news-biased neologisms. These findings suggest that the performance of current LLMs heavily relies on the diversity of training data and user-provided contextual information. The results of this study underscore the need for further investigation into the critical challenges in neologism research, lexicography, and corpus linguistics, as well as the role lexicography might play in enhancing the performance of LLMs. Keywords: lexicography, neologisms, unregistered words, news corpus, semantic neologism, representativeness, balance, lexicographic data, macrostructure, large language models
Published
2025-06-10
How to Cite
Nam, K., Lee, S., & Jung, H.-Y. (2025). Detection and Description of Neologisms in Korean Lexicography: Methodological Issues in Corpus Balance, Word Unit Bias and LLM Assistance. Lexikos, 35(1), 414-438. https://doi.org/10.5788/35-1-2045
Section
Leksikofokus / Lexicofocus