Semi-Automatic Retrieval of Definitional Information: A Northern Sotho Case Study
Abstract: Corpus-based terminology is currently gaining ground on the international front. Itis therefore important that terminologists working on the South African Bantu languages not onlytake note of this development, but that they should also follow this trend, even if they do not havethe same measure of access to highly sophisticated software. The aim of this article is therefore toestablish whether it is possible to retrieve definitional information on key concepts from untagged,running text by making use of affordable and easily accessible software such as WordSmith Tools. Inorder to answer this question, a case study is done in Northern Sotho, using textual material onlinguistics as basis for a special field corpus. Syntactic and lexical patterns serving as textual markersof definitional information are identified and the success rate of the computational retrieval ofdefinitional information is analysed and evaluated. Attention is also paid to the retrieval of specificallyconceptual information, which turned out to be a fortunate by-product of semi-automaticretrieval of definitional information. Finally, it is illustrated how definitional information retrievedcan be utilised in the writing of a formal terminological definition. Keywords: TERMINOLOGY, SOUTH AFRICAN BANTU LANGUAGES, DEFINITIONALINFORMATION, SEMI-AUTOMATIC INFORMATION RETRIEVAL, TERMINOLOGICAL DEFINITIONS,CONCEPTUAL RELATIONSHIPS, LEXICAL PATTERNS, SYNTACTIC PATTERNS,TEXTUAL MARKERS, KEYWORD-IN-CONTEXT (KWIC), WORDSMITH TOOLS Opsomming: Semi-outomatiese herwinning van definisie-inligting: 'n Noord-Sothogevallestudie. Korpus-gebaseerde terminologie is tans besig om veld te wen op dieinternasionale front. Dit is daarom belangrik dat terminoloë wat binne die Suid-Afrikaanse Bantoetalewerk, nie net sal kennis neem van hierdie ontwikkeling nie, maar dat hulle ook hierdie neigingsal volg, selfs al het hulle nie dieselfde mate van toegang tot gesofistikeerde rekenaarprogrammatuurnie. Die doel van hierdie artikel is daarom om vas te stel of dit moontlik is om definisie-inligtingoor sleutelkonsepte uit ongemerkte, lopende teks te herwin deur bekostigbare en toeganklikesagteware soos WordSmith Tools te gebruik. Ten einde hierdie vraag te beantwoord, is 'n gevallestudiein Noord-Sotho gedoen, met gebruikmaking van teksmateriaal oor die linguistiek as basisvir 'n gespesialiseerde korpus. Sintaktiese en leksikale patrone wat as tekstuele merkers van definisie-inligting dien, word geïdentifiseer en die suksesratio van rekenaarmatige herwinning vandefinisie-inligting word ontleed en beoordeel. Aandag word ook gegee aan die herwinning vanspesifiek konseptuele inligting, wat 'n onverwagse byproduk van die semi-outomatiese herwinningvan definisie-inligting is. Ten slotte word geïllustreer hoe definisie-inligting aangewend kan wordby die skryf van 'n formele terminologiese definisie. Sleutelwoorde: TERMINOLOGIE, SUID-AFRIKAANSE BANTOETALE, DEFINISIE-INLIGTING,SEMI-OUTOMATIESE INLIGTINGSHERWINNING, TERMINOLOGIESE DEFINISIES,KONSEPTUELE VERHOUDINGE, LEKSIKALE PATRONE, SINTAKTIESE PATRONE, TEKSTUELEMERKERS, KEYWORD-IN-CONTEXT (KWIC), WORDSMITH TOOLSCopyright of all material published in Lexikos will be vested in the Board of Directors of the Woordeboek van die Afrikaanse Taal. Authors are free, however, to use their material elsewhere provided that Lexikos (AFRILEX Series) is acknowledged as the original publication source.
Creative Commons License CC BY 4.0