Corpus-based Lexicography for Lesser-resourced Languages — Maximizing the Limited Corpus

D.J. Prinsloo

doi:10.5788/25-1-1300

Corpus-based Lexicography for Lesser-resourced Languages — Maximizing the Limited Corpus

D.J. Prinsloo Department of African Languages, University of Pretoria, Pretoria, South Africa

DOI: https://doi.org/10.5788/25-1-1300

Keywords: corpus-based lexicography, lesser-resourced languages, limited corpora, corpus tools, lexicographic tools

Abstract

This article focuses on lesser-resourced languages for which only very limited corpora are available and how such relatively small and often unbalanced, raw corpora could be maximally utilized for lexicographic purposes to obtain similar results as for bigger corpora. Sepedi and Afrikaans will be studied in this regard. The aim is to determine to what extent enlarging a corpus from e.g. one to 10 million, and from 10 million to 100 million words enhances its potential for (a) macrostructure compilation, (b) sourcing information on the most important microstructural aspects and (c) the creation of lexicographic tools. It will be argued that valuable and even sufficient data for the compilation of a specific dictionary can be extracted from a relatively small corpus of approximately one million words but that "bigger" in some instances indeed means "better".

Published

2015-11-20

How to Cite

Prinsloo, D. (2015). Corpus-based Lexicography for Lesser-resourced Languages — Maximizing the Limited Corpus. Lexikos, 25(1). https://doi.org/10.5788/25-1-1300

Download Citation

Issue

Vol. 25 (2015)

Section

Artikels/Articles

Copyright of all material published in Lexikos will be vested in the Board of Directors of the Woordeboek van die Afrikaanse Taal. Authors are free, however, to use their material elsewhere provided that Lexikos (AFRILEX Series) is acknowledged as the original publication source.

Creative Commons License CC BY 4.0