Gender Bias in Computer-generated Thesauri: The Case of the Serbian Section of Kontekst.io, a Thesaurus of Synonyms and Semantically Related Terms

  • Dragana Čarapić Faculty of Philology, University of Montenegro, Nikšić, Montenegro (https://orcid.org/0000-0003-3375-0169)
  • Milica Vuković-Stamatović Faculty of Philology, University of Montenegro, Nikšić, Montenegro (https://orcid.org/0000-0001-5497-1361)

Abstract

This paper studies gender bias in the computer-generated thesaurus Kontext.io, which is a search portal of synonyms and semantically related terms in Serbian, Croatian and Slovenian. Its Serbian section, which is the focus here, is based on a natural language processing (NLP) technique called word embeddings and a large internet corpus of Serbian. Gender bias is uncovered in four selected entries of this thesaurus: žena (woman), muškarac (man), d(j)evojka (young woman) and momak (young man). The analysis is first conducted semantically and the terms found are grouped into various semantic fields. After that, in the vein of the earlier studies of gender bias in traditional dictionaries and critical discourse analysis, an analysis of gender bias in the selected entries is provided. The results show that gender bias is ubiquitous and that it extends deeper than the earlier studies of gender bias in word embeddings have shown. We then give recommendations for improving this lexicographic product based on the results. Keywords: gender bias, computer-generated thesaurus, word embeddings, Kontekst.io, Serbian, lexicography
Published
2025-09-30
How to Cite
Čarapić, D., & Vuković-Stamatović, M. (2025). Gender Bias in Computer-generated Thesauri: The Case of the Serbian Section of Kontekst.io, a Thesaurus of Synonyms and Semantically Related Terms. Lexikos, 35(2), 26-45. https://doi.org/10.5788/35-2-2068
Section
Artikels/Articles