The Lemmatisation of Adverbs in Northern Sotho

To date Northern Sotho metalexicographers have focused their attention on lemmatisation problems in respect of the so-called main or primary part of speech categories, viz. nouns and verbs. See, for example, Prinsloo and De Schryver (1999) and Prinsloo and Gouws (1996). No attention has been given to the lemmatisation of adverbs. The latter are regarded by Ziervogel and Mokgokong (1975: 114, Introduction) as a "secondary part of speech". The treatment of adverbs in Northern Sotho dictionaries is marred by inconsistencies such as omissions from the macrostructure, insufficient and inconsistent labelling, inferior treatment in the microstructure, under-utilization of the mediostructure and outer texts, and reflects a lack of a strategy of selection of items for lemmatisation. Linguistic descriptions of adverbs in currently available grammars vary substantially and therefore confuse learners of the language and inexperienced lexicographers1. The aim of this article is to offer solutions to the lemmatisation problems regarding adverbs in Northern Sotho and to propose guiding entries for paper and electronic dictionaries which could serve as models for future dictionaries. The treatment of adverbs in Northern Sotho dictionaries will also be critically evaluated, especially in terms of frequency of use and target users' needs.


Introduction
According to Prinsloo and Gouws (1996: 103), the lexicographer is the mediator between theoretical linguistics and the everyday language user.In practical terms, this often means that the African-language lexicographer has to take great pains in lemmatising grammatically complex systems in a user-friendly way on the level of the target user.Typical examples are the lemmatisation of nouns, verbs, reflexives, adjectives and especially copulatives (cf.Prinsloo 2002).A dictionary should not primarily reflect the attitude of the lexicographer; it should rather be aimed at specific needs of a well-defined target user.It will be illustrated in terms of adverbs that lexicographers should strive to lemmatise adverbs in Northern Sotho in such a way that the whole spectrum of occurrences of adverbs is covered with maximum utilization of all lexicographic mechanisms at their disposal.The user-perspective, and especially the need for modern dictionaries to be user-friendly, has been prominent in lexicographic studies of the past decade (cf.Gouws and Prinsloo (1998), Hartmann and James (1998), Prinsloo and De Schryver (1999), Gouws (2000), etc.) and will be regarded as a given in this article.The South African situation moreover often demands dictionaries to be accessible to a wider user group than originally envisaged by the compiler.Lexicographers should therefore strive towards maximum poli-functionality of their dictionaries.Special attention should be given to the encoding needs of learners, in this case to the need to find enough information in dictionaries in order to actively use adverbs in speech and writing.
The aim of this article is to offer solutions to the lemmatisation problems regarding adverbs in Northern Sotho.It will also be attempted to show how macrostructural and microstructural strategies as well as the mediostructure can be maximally utilized in order to reach this objective.The different kinds of adverbs distinguished for Northern Sotho appear thousands of times in the Pretoria Sepedi Corpus.These enormous overall counts clearly indicate not only that they should be included as lemmas but also that an exhaustive treatment is required and/or justified especially for the encoding needs of inexperienced target users.Prerequisites will be to obtain an overall picture of the adverbial system and to find appropriate lemmatisation strategies for the different types of adverbs in Northern Sotho.The question is therefore what the lexicographer has to know about the adverb in Northern Sotho in order to em-bark on successful lexicographic treatment of adverbs and how to lemmatise them in a user-friendly way.It cannot be expected from him/her, however, to solve deeply-rooted theoretical differences between linguists on the approaches to the description of adverbs.
It will also be emphasized that in order to lemmatise adverbs successfully, the lexicographer should not hesitate to go beyond 'word boundaries' 3 in the selection of lemmas.Lexical elements smaller than words, such as affixes, and lexical elements larger than words, such as adverbial phrases, should be considered for lemmatisation.Gouws (1989: 84) correctly emphasizes that the traditional focus on the word as representative of the lexicon should be shifted to lemmas representing the lexical items of the particular language.
Although general definitions of adverbs vary, they all formulate the core function of adverbs as describing or modifying a clause or action in terms of especially time, place and manner.
An adverb is a word such as 'slowly', 'now', 'very', 'politically' or 'fortunately' which adds information about the action, event, or situation mentioned in a clause.(Sinclair 1995: 27) … a word used for describing a verb, an adjective, another adverb, or a whole sentence.Adverbs in English often consist of an adjective with '-ly' added, for example 'quickly', 'mainly', and 'cheerfully'.(Rundell 2002: 20) … to describe how, where, when or how often something happens … (Procter 1995: 20, textbox) Adverbs are words which qualify or describe verbs, adjectives and other adverbs in some or other way.(Van Wyk et al. 1992: 118) … adverbs describe the nature of the action in terms of time, place and manner.(Louwrens 1991: 26) It could be argued that learners and prospective, inexperienced compilers find the description and treatment of adverbs in currently available dictionaries and grammars of Northern Sotho unsatisfying and even confusing.
Firstly, a wide range of terminology is used to refer to the different kinds of adverbs, viz.basic adverbs, genuine adverbs, common adverbs, secondary derivations, derived adverbs, adverbs that developed from other categories, adopted adverbs, descriptive adjuncts and pseudo-adverbs.With particular reference to adverbial phrases, the terms particles, prepositions and prefixes are used to describe the same kind of lexical elements, depending on the theoretical framework favoured by the author in question.On the one hand different terms such as basic adverbs and genuine adverbs are used to refer to the same type of adverb while on the other, a single term, for example derived adverbs, is used to refer to different types of adverbs by different compilers.The learner can also easily mistakenly assume adverbs derived from other categories, and adverbs developed from other categories to be the same type of adverbs.The latter, however, refers to adopted adverbs.Louwrens (1991) regards ka, le, go, etc. which introduce adverbial groups, as particles, but Poulos and Louwrens (1994) call them prefixes.
Secondly, Louwrens (1991: 26) says "it is preferable not to regard particle groups as adverbs …" but in Poulos and Louwrens (1994) these groups are indeed regarded as adverbs (see main and subcategories 1 to 6 in Table 2).
The potential confusion for the learner and the prospective lexicographer can also be illustrated by means of kudu 'mainly'.Lombard (1985: 166) says it is a basic adverb not related to any other word category.Poulos and Louwrens (1994: 341) agree and add that it is not derived from any other word category and that it has an inherent adverbial meaning.Ziervogel and Mokgokong (1975: 114, Introduction) refer to it as a noun which is a common adverb, and in the central text indicate the part of speech of kudu as adverb.Kriel and Van Wyk (1989) label it as a noun of class 9 and offer no treatment of its adverbial characteristics in the entire article of the lemma kudu.Van Wyk et al. (1992) and Lombard (1985) recognize 3 basic types of adverbs.Louwrens (1991) distinguishes the categories time, place and manner.Lombard (1985: 168) does not make provision for adverbs of place and says that the so-called adverbs of place are not adverbs.Van Wyk et al. (1992) only say adverbs qualify "in some or other way".Louwrens (1991: 26), in contrast to Lombard (1985) and Van Wyk et al. (1992), does not categorise adverbs in terms of basic, derived and adopted. 4 Poulos and Louwrens (1994) describe adverbs in terms of their derivations and distinguish not less than 9 main categories and up to 17 subcategories.Ziervogel and Mokgokong (1975), in contrast to the other linguists, disregard the category "basic adverb".In fact they describe the nature of adverbs in a rather clumsy way.A dead reference in respect of the final category ga-adds to the user's predicament since vital information required to complete the paradigm cannot be retrieved at this point.
Other parts of speech are used as adverbs, or adverbs may be formed by affixing prefixes or suffixes to other parts of speech.Nouns are often used unchanged as adverbs.… secondary derivations with the secondary formatives ka-, le-, ga-, gomay also be regarded as adverbs.… Adverbs, usually those of quality, are derived from adjective and relative stems by means of ga-.(Ziervogel and Mokgokong 1975: 114-115, Introduction) Such inconsistencies, whether justified or not, have a negative effect on the learner's and/or user's information retrieval efforts.The issue here is not the validity of their views -criticism on linguistic grounds lies beyond the scope of this article.Furthermore, one should also accept that the adverb can be described from more than one angle and that progressive linguists have the academic right to change their minds.The concern lies with the learner who tries to master the nature and use of adverbs in Northern Sotho and with the lexicographer in his/her role as mediator who finds it difficult to obtain a comprehensive overview of the adverb in order to treat it satisfactorily on the macrostructural and microstructural levels in dictionaries.
http://lexikos.journals.ac.zaThirdly, a single glance at the treatment of adverbs in Northern Sotho dictionaries reveals far too many inconsistencies and errors.Kriel (1983) includes the lemma ga(n)nyane which means that the lemma could either be ganyane or gannyane.This lemma is placed in the wrong alphabetical position for either ganyane or gannyane.There is also another treated lemma gannyane, again in an incorrect alphabetical position.He gives ga n'.nyane as comment on form of ga(n)nyane but ga nya.ne as comment on form for gannyane.Kriel (1950) is inconsistent in respect of circumflexes and POS indication regarding adverbs.As an example of the latter, he labels gatee 'once' and gararo 'three times' as adverbs but not gabedi 'twice'.Ziervogel and Mokgokong (1975) lemmatise the question particles afa, na and naa but indicate the POS of afa as adverb.Incorrect alphabetical sorting of lemmas is a common problem in Kriel and Van Wyk (1989), e.g. for gakale, compare De Schryver and Lepota (2001, Note 6).Missing punctuation, for example a question mark at gakakang, and typing errors such as by.instead of byw. at gakalo are unfortunate.In the latter case the user can interpret the incorrectly spelt label as a translation equivalent, he/she may incorrectly conclude that gakalo means by 'at' instead of 'so many'.

Form and meaning of adverbs in Northern Sotho
A prerequisite to successful lemmatisation strategies for and treatment of adverbs, is a thorough understanding of the nature of adverbs in Northern Sotho.Poulos and Louwrens (1994: 328) say: The analysis of the adverb can be approached in different ways.One could, for example, classify adverbs according to whether they express the concepts of time, place, manner, etc.Or one could describe them in terms of their derivation, that is, in terms of the prefixes and/or suffixes that are used.Louwrens (1991: 26) says "adverbs describe the nature of the action in terms of time, place and manner" and gives the following examples.
-Adverbs of time: Pula e nele maabane It rained yesterday -Adverbs of place: Ba dutše moriting They are sitting in the shade -Adverbs of manner: Masogana a ja kudu The young men eat a lot Van Wyk et al. (1992: 118) distinguish three types of adverbs namely basic adverbs, derived adverbs and adverbs that have been adopted from other word categories.
The nature of the description (time, place and manner), the 3 basic types of adverbs (basic, derived, adopted and particle groups) and the way in which they are formed will now be interlinked in two ways in Tables 1 and 2. Table 1 interlinks the categories of time, place and manner with the three basic types of adverbs that occur in Northern Sotho, and with the way in which they are formed.Table 2 is based upon the way in which adverbs are formed, thus reflecting the viewpoint of Poulos and Louwrens (1994), and interlinked with the three basic types of adverbs as well as with the categories time, place and manner.
In this way the viewpoints of all the above-mentioned authors as well as most of their examples are catered for. 5The purpose of the compilation of Tables 1 and 2 is threefold.Firstly, either or both of these tables can assist the lexicographer in obtaining a comprehensive overview of the adverb in Northern Sotho.Secondly, these tables can be used in the back matter of a paper dictionary, or, thirdly, in pop-up information boxes in an electronic dictionary.It is for the lexicographer to decide whether he/she prefers to base the back matter entry (entries) and pop-up box(es) on say, Table 1 or Table 2 or both, or whether to use these tables as they are or to adapt them to the level of the target user of the dictionary.Thus, for example, kudu in Table 1 is a basic adverb of manner belonging to the subcategory (vii) "basic, non-derived adverbs with an inherent adverbial meaning" within the main category 9 "word categories which may function as adverbs without the addition of any prefixes or suffixes" of Poulos and Louwrens (1994).Given the presentations of the different linguists of adverbs in Northern Sotho, as well as Tables 1 and 2, it is for the lexicographer to decide on the best angle of approach for lemmatisation of these adverbs.He/she can decide to approach the lemmatisation of adverbs departing from the way in which they are formed, or from the basic types of adverbs or even in terms of their function.Whatever the preferred angle might be, sound decisions regarding lemmatisation, treatment in the microstucture, utilization of the mediostructure, and treatment in the user's guide and back matter have to be taken.In this article, lemmatisation will be attempted on the basic types of adverbs.
http://lexikos.journals.ac.zaIn addition to information on frequency, corpus lines and information on collocates obtained from the corpus can be very useful to the lexicographer.Compare for example information on the most frequent collocates of ruri in Table 4. From this extract from the collocates table for ruri it is clear that the possessive concord, classes 7/8, sa occurs very frequently one position to the left (L1) of ruri, thus 357 occurrences of sa ruri.Likewise, the copulative stem -le, and in fact the entire copulative verb e le, are indicated as frequent collocates of ruri in the positions L1 and L2 respectively.Sa ruri and e le ruri are therefore prime candidates for inclusion in the microstructural treatment of ruri.Let there furthermore be no doubt that the corpus is a most valuable source for, among others, sense distinction, typical examples, collocations, decisions on inclusion in or omission from the dictionary.Compare De Schryver and Prinsloo (2000Prinsloo ( , 2000aPrinsloo ( and 2000b) ) for an exhaustive overview of corpus compilation and corpus utilization on macro-and microstructural levels.

Lemmatising derived adverbs
Since the number of derived adverbs is unlimited or open ended, it is not possible to lemmatise all forms separately.From a lexicographic angle, a number of issues are at stake here.Firstly, there is a need for selection in the case of such paradigms.Secondly, the lexicographer has to take decisions in terms of paradigm completion.Thirdly, the lexicographer has to consider certain affixes and particles/prepositions for inclusion in the macrostructure, i.e. as lemmas in their own right.The following analysis is a typical example of how such instances should be approached.
Consider firstly the open-ended paradigm gatee 'once', gabedi 'twice', … galesome 'ten times', … galekgolo 'hundred times', … gadiketekete 'thousands of times' … .The lexicographer in his/her role as mediator has to take certain decisions, e.g. in respect of inclusion into or omission from the dictionary, after having studied corpus data and available dictionaries.The frequency counts reveal a rather interesting pattern.From Table 5 and Figure 1 it is clear that, from a frequency angle, once, twice and three times are much http://lexikos.journals.ac.za more frequently used than four times up to nine times with relative frequency for "rounded off" numerals such as ten times and hundreds of times.Treatment in existing dictionaries indicates that the compilers did fairly well on intuition but did miss out on frequently used items such as especially Sediba (Lombard et al. 1992) for twice, four times and five times, and NEnSeD (Kriel 1950), Pukuntšu (Kriel 1983), Pukuntšu (Kriel and Van Wyk 1989) as well as Sediba for ten times.
As far as the principle completing-a-paradigm is concerned, two strategies are suggested.Firstly, the lexicographer could complete the 1-to-10 paradigm by also entering four times up to ten times as separate lemmas, although in terms of frequency counts, this cannot wholly be justified.Secondly, the rest of the open-ended paradigm could be addressed by lemmatising the outstanding "beacons" such as ten times, hundreds of times, thousands of times, etc. Guidance in respect of the paradigm as a whole could be done by appropriate cross-referencing to the back matter.The back matter section would then explain the normal (rather complicated) numerical system of Northern Sotho for expressing numbers from say 1 to 10 and 11 up to 10 000 000 and/or contain references to grammar books where this system is described.Thirdly, the prefix ga-(used to derive these adverbs) should be entered as a separate lemma, cf.(1).Compare Gouws (1989: 84) for the importance of lemmatising elements bigger than words and also elements smaller than words.
(1) ga-adv prefix gatee 'once' < tee 'one' Ba mmethile gatee fela They hit him only once.gabotse 'well' < botse 'lovely' Sepela gabotse!Go well!gantši < -ntši; gammogo < -mmogo; gagolo < -golo; gabedi < -bedi ► BM 2.8 7 This suggested entry not only caters for the numerical paradigm but also covers the other most frequent typical adverbs formed by means of this derivation strategy, cf.Poulos and Louwrens' Category 8, either as a treated sublemma in the case of gabotse or as untreated sublemmas such as gantši, gammogo and gagolo.This brings us to another open-ended paradigm, namely all adverbs derived by means of the adverbial prefix ga-(of which the numerals just discussed, are only a subsection).Once again the lexicographer has to find a strategy for inclusion or omission.Consider the most frequently used adverbs in this broader category.From this table it is clear that NEnSeD (Kriel 1950) missed out on very frequently used adverbs such as gannyane, gabotsebotse, gabonolo, Sediba (Lombard et al. 1992) on gagolo, etc.The high frequency counts for gabotsebotse and gabotsana furthermore urge the lexicographer to venture beyond the boundaries of the "basic word", viz.those consisting of a prefix and a stem, and also to consider forms with reduplicated stems and diminutive forms for lemmatising and not merely the basic forms ga + stem.The next step is to look into all other derived adverbs especially Poulos and Louwrens' Categories 1-7 and 9(vi).
Here, each of the particles ka, le, go, ga, mo and kua as well as the suffix -ng should be lemmatised with elaborate attention in the microstructure of each article to its function as initiator of adverbial groups.For example, one should attempt to cover all of the P&L categories 1(i) to 1(vii) in the treatment of the lemma ka.The lexicographer should, preferable in the user's guide, take a clear stand on the use of the terms preposition versus prefix versus particle, and should not burden the user with grammatical labels such as prep./pref./part., cf.(2).
(2) ka part.[intr.adv.phrases], o sepela ka sefatanaga she goes by car; ka Labobedi on Tuesday; ka toropong in town.► BM 1.1-1.3 As in the case of the prefix ga-above, the lexicographer should not hesitate to lemmatise the locative suffix -ng as an article in its own right.Poulos and Louwrens' Categories 1(iii), 1(v) and 9(vi) also require special attention.Here the lexicographer should be prepared to lemmatise multiword lemmas such as ka ga, la mathomo, 'beginning' la bobedi 'for the second time' and even, not mentioned by Poulos and Louwrens, ka mo, ka kua, etc.Furthermore, in the case of la bobedi for example, appropriate cross-references should be given to Labobedi 'Tuesday' and bobedi 'second'.Consider their frequencies in the corpus: It should be reiterated that the lexicographer should also and always use the corpus as an invaluable aid to the lexicographic treatment of all types of adverbs of which the study of concordance lines like those in Table 8 generated for the adverb galesome 'ten times', is a typical example.

Lemmatising adopted adverbs
In the case of adopted adverbs, the lexicographer is once again confronted by limited or even open-ended paradigms but also with difficult decisions regarding the functions as adverbs versus nouns, especially in terms of part-of-speech indication.Firstly a number of paradigms, this time mostly on a semantic level, have to be dealt with such as lehono : maabane : maloba, 'today : yesterday : the day before yesterday', fase : godimo : morago 'below : above : behind', leboa : borwa : bohlabela : bodikela 'north : south : east : west', etc. Frequency of use and the obligation to complete such semantic paradigms should be the norm.
Lemmatising nouns that are often or even exclusively used as adverbs, twice, once with POS-label adverb and again with POS-label noun, will be totally redundant.In the microstructural treatment, lexicographers often opt for indicating the POS in such cases as noun with no reference to a possible adverbial function.Neglecting the POS adverb in this way can however only be tolerated up to a point where the labelling of adverbs as nouns becomes artificial and questionable, especially in those cases where nouns are exclusively used as adverbs.The question here is whether the part of speech of nouns that are exclusively used as adverbs should be indicated as noun, adverb or both.What should definitely be avoided is a situation where the same adverb is labelled differently in different dictionaries, or even in different editions of the same dictionary, or where clearly "related" adverbs (i.e.belonging to the same paradigm), are labelled differently in the same dictionary.Consider the treatment of the three words listed by Lombard (1985: 167) as adverbs that developed from class 6 nouns, i.e. maabane, maloba and mantšiboa, as a case in point.
( All these dictionaries offer a single entry for each of these words.In (5) both dictionaries label maabane as an adverb, in (6)(a) maloba is labelled as a noun with no separate entry or reference whatsoever to adverb but in (6)(b) as an adverb.In (7)(a) a single entry is given for mantšiboa but with dual labelling of its function.In (7)(b), in contrast to (5)(b) and ( 6)(b), the lemma is now labelled as a noun but in a later edition of the same dictionary, i.e. (7)(c), also labelled as an adverb.Different options can be considered here.The lexicographer could simply ignore the overwhelming or even exclusive function of such nouns as adverbs and consistently label them as nouns (coupled with an explanation in the user's guide and/or back matter of the dictionary) as in (6)(a) and (7)(b).Alternatively, the lexicographer could decide to label the POS in cases where nouns are exclusively used as adverbs, as in ( 5) and (6)(b) or even in addition to the label noun, as in (7)(c).A third possibility, which would represent a sound application of the metalanguage could be to order the POS-labels according to the dominant function, i.e. n./adv.if the nominal function is more frequent or adv./n.if the word is more frequently used as an adverb.This has to be clearly explained in the front matter of the dictionary.The dominant function can be determined on the basis of frequency counts in the corpus.

Electronic dictionaries
Generally speaking, many more options are available to the lexicographer in electronic dictionaries and fewer restrictions exist in terms of access, available space, mediostructure, etc. See Prinsloo (2001) and De Schryver (2003) for detailed discussions of electronic dictionaries.For example, pop-up screens alone can instantly provide the user with a wealth of information on various aspects of adverbs.This could for instance be done as shown in (8) by simply momentarily resting the cursor on the label adverb.Note that all this information, brought together in an instant, also narrows the gap between dictionary and grammar, which is generally believed to be "unbridgeable" (cf.Geeraerts 2000: 77).

Conclusion
Compiling user-friendly dictionaries of a high lexicographic standard for African languages poses a great challenge to prospective lexicographers.They often are the mediators between complicated grammatical structures and the decoding and encoding needs of their target users.Adverbs should not be lemmatised haphazardly as they cross the compiler's way.They should be carefully researched and lemmatised in a structured way.Lexicographers should be aware of the fact that different subcategories of the same phenomenon might require different lexicographic treatments as in the case of basic adverbs versus derived adverbs versus adopted adverbs.This is even true for subcategories within a given category such as the various approaches required for different categories of adverbs derived by ga-.On the macrostructural level, candidates for inclusion (or omission) should carefully be considered, preferably based on corpus data.On the microstructural level, data should be presented in such a way that the needs of both encoding and decoding users are met and the medio-structure should be maximally utilized.The ultimate aim should be to ensure an unimpeded information retrieval process in respect of easy access to the lemma, -successful information retrieval in the microstructure, -added value obtained in following up on cross-references, -useful guidance from the user's guide in the front matter, -a comprehensive overview of adverbs in the back matter, and appropriate references to external sources such as grammar books.
An estimated 80% of freelance lexicographers and lexicographers employed by the National Lexicography Units in South Africa have little or very limited lexicographic experience.

3.
"Word boundaries" should here be interpreted as for orthographic words.

4.
The term "adopted adverb" is used in terms of Van Wyk et al. (1992) in this article and should not be interpreted in the more general sense of adopted 'borrowed from another language'.

5.
Note that the status of ideophones as adverbs is not recognized in these classifications and requires further research.Compare also Poulos and Louwrens (1994: 351).

6.
If Christian religious data in the corpus are taken into account.

7.
The symbol ► is a reference marker referring the user to the reference address which in this case is Section 2.8 in the back matter (BM).

Table 1 :
Poulos and Louwrens 1994)nked to basic types of adverbs and the way in which they are formed (P&L =Poulos and Louwrens 1994)

Table 2 :
= Poulos and Louwrens 1994)e formed, linked to the basic types of adverbs and the categories time, place and manner (P&L= Poulos and Louwrens 1994)

Table 3 :
Overall frequencies of basic adverbs in the corpus

Table 5 :
Overall frequencies of the numeral paradigm gatee, gabedi, … gadimilione in the corpus

Table 6 :
The most frequently used adverbs derived by means of the prefix ga-

Table 7 :
Frequencies of multiword adverbs that are candidates for lemmatisation