Towards a Corpus of South African English: Corralling the Sub-varieties

Leela Pienaar, Vivian de Klerk

Abstract


Abstract: Within the last twenty years, the use of a corpus for language research has become the sine qua non in many areas of linguistic enquiry. This trend is particularly evident in lexicography, a discipline which has become increasingly and overtly 'corpus-driven'. This article draws on research from a Master's project which involved the collection of a small corpus of Indian South African English (ISAE), an acknowledged component or sub-variety of South African English (SAE). The discussion highlights the importance of aim-ing for a balanced representation of the known sub-varieties of a language when compiling corpora for lexi-cographic and linguistic investigation. Since ISAE is primarily an oral dialect, specific focus is given to the methodological challenges involved in compiling a spoken corpus. Methodological insights from local as well as international corpus research were used to guide and inform the process. These include the Xhosa English Corpus, the New Zealand Corpus of Spoken English and the Hong Kong Corpus of Conversational English. The various stages in the research process are described, together with explanations of how prob-lems such as ways of corpus design, the selection of corpus contributors, the data-collection process and developing guidelines for consistency during the corpus compilation were addressed. The article provides a keyhole view of the main lexical and syntactic features of ISAE exemplified in the corpus and juxtaposes these against the backdrop of general SAE and trends in World English. The article concludes with a pro-posal for the collection of parallel corpora of other sub-varieties of SAE which will provide an objectively compiled repository of language in use to enable researchers to discern the linguistic features at the core and periphery of SAE. It is argued that the establishment of corpora of the various known sub-varieties of SAE could constitute an important step towards the creation of a truly representative large corpus of SAE and ultimately towards a better definition and understanding of SAE.

Keywords: CORPUS, SPOKEN CORPUS, DESIGN, SOUTH AFRICAN ENGLISH, SUB-VARIETIES, INDIAN SOUTH AFRICAN ENGLISH, LEXICOGRAPHY

Opsomming Op weg na 'n korpus van Suid-Afrikaanse Engels: Die byme-kaarbring van die subvariëteite. Gedurende die laaste twintig jaar het die gebruik van 'n korpus vir taalnavorsing die sine qua non op baie gebiede van taalondersoek geword. Hierdie neiging is veral te sien in die leksikografie, 'n vakgebied wat toenemend en merkbaar "korpusge-drewe" geword het. Hierdie artikel maak veral gebruik van navorsing vir 'n Meestersprojek wat die versameling behels het van 'n klein korpus Indiese Suid-Afrikaanse Engels (ISAE), 'n erkende komponent of subvariëteit van Suid-Afrikaanse Engels (SAE). Die bespreking beklemtoon die belangrikheid om na 'n gebalanseerde weergawe van die bekende subvariëteite van 'n taal te streef wanneer korpusse vir leksikografiese en linguistiese ondersoek saamgestel word. Omdat ISAE primêr 'n mondelinge dialek is, word spesifiek gefokus op die metodologiese uitdagings gepaard-gaande met die samestelling van 'n gesproke korpus. Metodologiese insigte van sowel plaaslike as internasionale korpusnavorsing is gebruik om leiding en vorm aan die proses te gee. Dit sluit die Xhosa English Corpus, die New Zealand Corpus of Spoken English en die Hong Kong Corpus of Conventional English in. Die verskillende stadiums in die navorsingsproses word beskryf, saam met verduidelikings van hoe probleme soos maniere van korpusontwerp, die keuse van korpus-bydraers, die dataversamelingsproses en die ontwikkeling van riglyne vir konsekwentheid gedu-rende die korpussamestelling gehanteer is. Die artikel verskaf 'n intieme blik op die belangrikste leksikale en sintaktiese eienskappe van ISAE soos beliggaam in die korpus en plaas dit teen die agtergrond van algemene SAE en neigings in Wêreldengels. Die artikel sluit af met 'n motivering vir die versameling van parallelle korpusse van ander subvariëteite van SAE wat 'n objektief saam-gestelde bron van taal in gebruik sal verskaf om navorsers in staat te stel om taalkundige eien-skappe in die kern en periferie van SAE te onderskei. Daar word geredeneer dat die totstand-brenging van korpusse van die verskillende bekende subvariëteite van SAE 'n belangrike trap kan vorm tot die skep van 'n werklik verteenwoordigende groot korpus van SAE en uiteindelik tot 'n beter omskrywing en begrip van SAE.

Sleutelwoorde: KORPUS, GESPROKE KORPUS, ONTWERP, SUID-AFRIKAANSE ENGELS, SUBVARIËTEITE, INDIESE SUID-AFRIKAANSE ENGELS, LEKSIKOGRAFIE


Keywords


CORPUS; SPOKEN CORPUS; DESIGN; SOUTH AFRICAN ENGLISH; SUB-VARIETIES; INDIAN SOUTH AFRICAN ENGLISH; LEXICOGRAPHY

Full Text:

PDF


DOI: https://doi.org/10.5788/19-0-444

Refbacks

  • There are currently no refbacks.



ISSN 2224-0039 (online); ISSN 1684-4904 (print)

Creative Commons License CC BY 4.0


Powered by OJS and hosted by Stellenbosch University Library and Information Service since 2011.


Disclaimer:

This journal is hosted by the SU LIS on request of the journal owner/editor. The SU LIS takes no responsibility for the content published within this journal, and disclaim all liability arising out of the use of or inability to use the information contained herein. We assume no responsibility, and shall not be liable for any breaches of agreement with other publishers/hosts.

SUNJournals Help