Dictionary Writing System (DWS) + Corpus Query Package (CQP): The Case of TshwaneLex

Gilles-Maurice de Schryver; Guy De Pauw

doi:10.5788/17-0-554

Dictionary Writing System (DWS) + Corpus Query Package (CQP): The Case of TshwaneLex

Gilles-Maurice de Schryver
Guy De Pauw

Keywords: LEXICOGRAPHY, DICTIONARY, SOFTWARE, DICTIONARY WRITING SYS-TEM (DWS), CORPUS QUERY PACKAGE (CQP), TSHWANELEX, CORPUS, CORPUS ANNO-TATION, PART-OF-SPEECH TAGGER (POS-TAGGER), MACHINE LEARNING, NORTHERN SOTHO (SESOTHO SA LEBOA)

Abstract

Abstract: In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as such the encouraging outcomes of this study are far-reaching.

Published

2014-08-27

How to Cite

de Schryver, G.-M., & De Pauw, G. (2014). Dictionary Writing System (DWS) + Corpus Query Package (CQP): The Case of TshwaneLex. Lexikos, 17. https://doi.org/10.5788/17-0-554

Download Citation

Issue

Vol. 17 (2007)

Section

Lexikoprogrammatuur/Lexicosoftware

Copyright of all material published in Lexikos will be vested in the Board of Directors of the Woordeboek van die Afrikaanse Taal. Authors are free, however, to use their material elsewhere provided that Lexikos (AFRILEX Series) is acknowledged as the original publication source.

Creative Commons License CC BY 4.0