Semi-automatic Term Extraction for an isiZulu Linguistic Terms Dictionary

Langa Khumalo


The University of KwaZulu-Natal (UKZN) is compiling a series of Language for Special Purposes (LSP) dictionaries for various specialized subject domains in line with its language policy and plan. The focus in this paper is the term extraction for words in the linguistics subject domain. This paper advances the use of frequency analysis and the keyword analysis as strategies to extract terms for the compilation of the dictionary of isiZulu linguistic terms. The study uses the isiZulu National Corpus (INC) of about 1,2 million tokens as a reference corpus as well as an LSP corpus of about 100,000 tokens as a study corpus. The study is analyzed through the use of a software tool called WordSmith Tools (version 6). WordSmith Tools (hence forth WS Tools) is an integrated suite of three main programs, which include the WordList, Concord and Keywords, used in analysing words and word patterns in any given text. Using the WS Tools software a lot of qualitative and quantitative research can be done in the language. Central to this study is a computational determination of which words are typical of the linguistic domain in isiZulu and therefore stand out as preferred candidates for headword selection. Thus the study uses the corpus linguistics method as a basis for theoretical analysis. The advantage of such a theoretical approach is that a corpus is stored and queried by means of computer and computer software, which makes it easy to find, sort and count items, either as a basis for linguistic description or for addressing language-related issues and problems. Using the WS Tools software, the study shows that term extraction for the isiZulu dictionary of linguistic terms is done following reliable computational techniques in corpus lexicography.


term extraction; LGP corpus; LSP corpus; Wordsmith Tools; frequency; wordlist; concord; keyness; lexicography; corpus lexicography; headword selection; LSP dictionary

Full Text:




  • There are currently no refbacks.

ISSN 2224-0039 (online); ISSN 1684-4904 (print)

Creative Commons License CC BY 4.0

Powered by OJS and hosted by Stellenbosch University Library and Information Service since 2011.


This journal is hosted by the SU LIS on request of the journal owner/editor. The SU LIS takes no responsibility for the content published within this journal, and disclaim all liability arising out of the use of or inability to use the information contained herein. We assume no responsibility, and shall not be liable for any breaches of agreement with other publishers/hosts.

SUNJournals Help