Dictionary Writing System (DWS) + Corpus Query Package (CQP): The Case of TshwaneLex

Gilles-Maurice de Schryver, Guy De Pauw

Abstract


Abstract: In this article the integrated corpus query functionality of the dictionary compilation software TshwaneLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as such the encouraging outcomes of this study are far-reaching.

Keywords


LEXICOGRAPHY, DICTIONARY, SOFTWARE, DICTIONARY WRITING SYS-TEM (DWS), CORPUS QUERY PACKAGE (CQP), TSHWANELEX, CORPUS, CORPUS ANNO-TATION, PART-OF-SPEECH TAGGER (POS-TAGGER), MACHINE LEARNING, NORTHERN SOTHO (SESOTHO SA LEBOA)

Full Text:

PDF


DOI: https://doi.org/10.5788/17-0-554

Refbacks

  • There are currently no refbacks.



ISSN 2224-0039 (online); ISSN 1684-4904 (print)

Creative Commons License CC BY 4.0


Powered by OJS and hosted by Stellenbosch University Library and Information Service since 2011.


Disclaimer:

This journal is hosted by the SU LIS on request of the journal owner/editor. The SU LIS takes no responsibility for the content published within this journal, and disclaim all liability arising out of the use of or inability to use the information contained herein. We assume no responsibility, and shall not be liable for any breaches of agreement with other publishers/hosts.

SUNJournals Help