Loan Words versus Indigenous Words in Northern Sotho — A Lexicographic Perspective

The aim of this article is to investigate, from a lexicographic perspective, the preferences of Northern Sotho mother-tongue speakers for loan words versus so-called 'traditional' or 'original' counterparts in the language. Results obtained from a survey conducted among 100 randomly selected mother-tongue speakers from different age and gender groups, backgrounds, places of residence, etc. will be analysed. It is shown that although the overwhelming preference of the respondents lies with the use of (more) indigenous words in comparison to loan words, lexicographers should be alerted to possible, even rapid, changes in this preference pattern. The results from the survey are compared throughout with frequency counts derived from a corpus as well as with current dictionary treatment.

to point out unacceptable use.According to this view the dictionary should rectify and cleanse the language, preserve its purity, lengthen its duration, correct or ban improprieties and absurdities, sensor faulty usage and repress anomaly.This authoritarian tradition, however, collapsed mainly due to one simple principle, viz.language change.In the words of Philip Gove (1961 3 : 4a), the Editor-in-Chief of Webster's Third: English like other living languages is in a metabolic process of constant change.The changes affect not only word stock but meaning, syntax, morphology and pronunciation.
Indeed, in contrast to the prescriptive approach stands the descriptive approach with the focus on actual language usage.Gove (1961a: 13, quoted in Al-Kasimi (1977: 84)) writes in a letter to Life Magazine: The responsibility of a dictionary is to record the language, not set its style.For us to attempt to prescribe the language would be like Life reporting the news as its editors would prefer it to happen.Prinsloo (1992: 10) rightfully emphasizes that the then Northern Sotho Language Board made an invaluable contribution towards the clarification, systematisation, standardization and coining of new terms for Northern Sotho in, for example, religion, news broadcasting, mathematics, general science, etc.The last Terminology and Orthography for Northern Sotho, for short T&O, produced by this language board was published in 1988 (Departmental Northern Sotho Language Board 1988 4 ).The Language Board adopted a sensible approach in being prescriptive in the coinage and approval/disapproval of terminology on the one hand, while still placing a high premium on actual usage as criterion for acceptability on the other hand.The Language Board (T&O: 3) allows for more than one option rather than attempting to enforce just one term while suppressing others: It is anticipated that practical usage of the terms offered will prove that some of them can be replaced by other more commendable ones.
The same holds true for the Language Board's attitude towards certain entries in existing dictionaries (T&O: 1): [C]ertain new terms and concepts are included which appear in some dictionaries but which are not generally accepted in the language yet.In such instances the Northern Sotho Language Board devised a term of their own which in their opinion is more appropriate.
Let there be no doubt that actual usage and not the sentiments of a language board will eventually determine whether a word should be included into or omitted from a dictionary.Zgusta (1971: 187) strongly argues: Lexicographers can coin new expressions, they can normalize their form and meaning, they can systematize and clarify the old ones, they can help in an endless number of such exceedingly useful and necessary tasks.The real life of a language, however, is in its use; and the definitive, full-fledged stabilization of the standard national language is brought about by its being really and extensively used in literature and in oral communications of all types.
It is interesting to note, for example, that as far as the months of the year are concerned, the then Language Board prescribes the use of "Sothoised" terms like Matšhe 'March' instead of Hlakola, Aprele 'April' instead of Moranang, etc.If the actual usage of the traditional terms would gain in importance or even prevail, a normalising board would have to back down and take a more descriptive approach.
As will be indicated below, the names for the months of the year seem to be an exception in the general preference pattern, namely that in this case loan words are preferred to the traditional words.Note, however, that there might be other factors influencing the choice in respect of months of the year, such as the lack of a one-to-one correlation between the traditional name and the actual month to which it refers.As a matter of fact, Ziervogel and Mokgokong (1975: 828, 1022, 1039) state that Moranang can refer to both April and June, Phato to both August and October, Pherekgong to both January and March, etc. which is of course unacceptable in real life situations where activities are punctually scheduled in terms of date and time.In simple terms it means that two people can agree on meeting each other on, say, the 1st of Pherekgong but then miss each other by two months.
More recently, metalexicographers such as Bergenholtz (2002: 12) have introduced the term proscriptiveness: I wish to suggest a specification and the introduction of a new term, proscription, which in actual fact is only new as a term, since the phenomenon itself is known in many dictionaries around the world.What is meant is the suggested use of a certain variant based on an exact analysis of an empirical basis without prohibiting other existing variants.
Within a proscriptive framework, the paradox that even a descriptive dictionary has a prescriptive effect on the target users is taken into account (compare Bergenholtz 2001).With reference to the current study, the task of the proscriptive compiler of dictionaries for Northern Sotho in terms of loan words versus their (more) indigenous counterparts, is thus to reflect user preferences in the selection of lemma signs on the macrostructural level as well as in the extent of treatment on the microstructural level, while still allowing for other existing variants.Consequently, within a proscriptive approach towards the lemmatisation of loan words in contrast to their 'traditional' or 'original' coun-terparts, it is imperative for the lexicographer to know what the preferences of the target-user group are in this regard.This is exactly what will be pursued in the following paragraphs.

The survey
The scope of the research was limited to a random sample of 100 respondents, all mother-tongue speakers of Northern Sotho.The survey was conducted in May 2002, and the breakdown in terms of gender, age, birthplace and education/job is presented in condensed format in (1).
(1) Loan word survey -Basic respondents' data (N = 100) A total number of 64 single words were presented in pairs to the respondents, thus 32 pairs each containing a loan word and a (more) indigenous counterpart, e.g.radio versus seyalemoya 'radio', or dimonamonane versus malekere 'sweets', etc. Respondents were asked to mark the alternative(s) which they would like to see included in a Northern Sotho dictionary.A third column was added for comments and suggestions of other words considered to be still better than the two choices offered.Respondents were also invited to report spelling errors or to suggest improvement of spelling, and even to motivate why a word should be included or excluded from the dictionary.Finally, an informal conversation was conducted with each respondent in order to obtain additional information and an overall impression.A typical example of a completed questionnaire is reproduced in the Appendix.Compare also the translated version of the questionnaire in (2).
( We assure that all the information you have provided will be processed anonymously.

THANK YOU FOR YOUR COOPERATION!
Note that the pairs were not presented in a fixed order.In some cases the loan word L is given first followed by the (more) indigenous word I, sometimes it is the reverse.The loan words themselves are all adopted from Afrikaans, English or Zulu -occasionally Sothoised, at other times direct borrowings.The corresponding (more) indigenous forms are either new coinages in Northern Sotho, shifts in meaning from existing Northern Sotho words, or simply traditional/ original words.

Analysing the survey
Basically, there are three levels on which the questionnaires were analysed.Firstly, each respondent's input was analysed in isolation.This resulted in 100 user profiles (Level 1).Secondly, these 100 profiles were summed, upon which general tendencies became evident (Level 2).Thirdly, each L-I pair was studied separately, which provided interesting and highly specific data for each specific L-I pair.For each of these pairs, frequency counts derived from a corpus as well as the treatment in all currently available dictionaries were also taken into account (Level 3).Space considerations unfortunately do not allow us to present the outcome on these three levels exhaustively.Rather, some representative findings will be singled out.
The 100 filled-in questionnaires were processed in spread-sheet format, reflecting each pair and the full statistical response for every pair -viz.a score for L, a score for I, and a score for both -and this for Respondent 1, 2, ... up to Respondent 100.See (3).
(3) Analysing the loan word survey -Levels 1 and 2 From (3) one can for instance see that 57% of the respondents opted for seyalemoya, 17% for radio, and as many as 25% for both.(Only one respondent did not opt for anything for this pair.)If one focuses only on L or I, then 42 (= 17 + 25) respondents can agree with radio, 82 (= 57 + 25) with seyalemoya, as shown in the last column.
For each L-I pair, frequency counts were derived from the 5.8-millionword Pretoria Sepedi Corpus (PSC).For nouns the frequencies for the singular and plural forms were retrieved (Fsg and Fpl), for verbs the frequencies for the stems (Fst).In addition, the treatment (or lack thereof) was investigated in 9 Northern Sotho dictionaries.In (4), Columns 1-3 summarise the results from (3), the next two columns show the frequency data, while all remaining columns reflect the occurrences (or lack thereof) as lemma signs in 9 Northern Sotho dictionaries.(A list of dictionary abbreviations can be found at the end of this article.Note also that the dictionaries are arranged chronologically in the tables below.)(4) Analysing the loan word survey -Level 3  From (4) one can for instance derive that the loan word malekere is more frequent in PSC than the original dimonamonane, and that both occur only in the plural in the corpus.The respondents, however, favour the original, while those current dictionaries which treat these forms do so rather evenly.

Items
For each L-I pair, the data presented in (4) and the extra information obtained from the supplementary conversations conducted with the respondents were combined.For each pair this resulted in what can be called an 'L-I sketch'.Compare, for example, the 'Janeware-Pherekgong sketch' shown in (5), recalling the standpoint of the then Language Board discussed above.
R slightly favour L (42% vs. 38%).C seemingly (cf.below) only has I, and D agree on I.
However, L appears as Janaware (19/0) in New E, T&O, Pop. and Grb., which is thus the form to be included in the D (and not Janeware (0/0) The extract shown in ( 5) is taken verbatim from the survey analysis.Even from this single (cryptic) L-I sketch the proscriptive approach should be evident: the target users and the corpus favour the loan word L (descriptiveness), and this happens to correspond with the Language Board's suggestion (prescriptiveness).However, as the target users and the corpus also indicate that the traditional word I has a right to be included in the dictionary, this 'variant' should also be treated on both the macro-and microstructural levels, albeit with the necessary cross-references and cursory notes (proscriptiveness).

Analysing the survey -Distracters
The Janeware-Pherekgong pair is one example of an L-I pair where the oneto-one correlation does not truly hold, yet where there is still a large overlap.Another example is shown in ( 6). ( Taken at face value, the two options seem semantically different with mmila 'footpath' versus seterata 'street'.However, when the respondents' considerations are studied, it becomes clear that mmila and seterata can indeed be semantically linked via 'road' since some respondents and dictionaries seem to imply a semantic continuum for the end points mmila versus seterata in footpath ↔ road ↔ street.This overlap is obvious in the treatment of mmila and seterata in, e.g., Ziervogel and Mokgokong (1975): (7)(a) MMILÁ ... footpath, trail (of game), road, street, side-walk (7)(b) -TÉRÁTA, se-... street Pairs such as Janeware-Pherekgong and mmila-seterata were purposely inserted into the survey in an attempt to trigger comments that would enable the deduction of the overriding sentiment with regard to loan words versus (more) indigenous words.For the latter pair, several respondents aged between 21 and 36, from different areas in both the Limpopo and Gauteng provinces, state that seterata is equivalent to mokgotha (13/47).This is (partly) true and clearly points in the direction of a preference for indigenous words.
Apart from pairs which only partly overlap, carefully selected distracters were also built into the questionnaire.These have been marked as such in (2) above.Their main aim was to verify the quality of the respondents' feedback.Some distracter-pairs consist of two loan words L, for others the two options have very different meanings, and a third category was added just to find out if the respondents themselves were consistent.Consider ( 8) and ( 9) as examples of the first category.
In these examples the respondents were presented with a direct borrowing (not the purpose of the investigation) and a (Sothoised) loan word.In (8) just one respondent preferred newspaper as the only option and 6 allowed for both, while in (9) 10 opted for computer and another 7 for both.These distracterpairs thus managed to discriminate well.Compare also the following remark by one of the respondents: We must not include Sothoised words in the Northern Sotho dictionary if we have an original word for the Sothoised word, but if we do not have a word we can Sothoise any of the words, like computer can be called khomphutha in Northern Sotho, and include it in the dictionary instead of using an English word.
Another distracter also belonging to this first category is shown in ( 10). ( Although both terene and setimela are loan words for 'train' (from Afrikaans 'trein' and English 'steamer' respectively), it is clear that the respondents (as well as the corpus and the current dictionaries) prefer the word which looks most like a genuine Northern Sotho word. 4 Actually, none of the younger respondents realised that setimela is a loan word.An example of the second category of distracters is shown in ( 11). ( The loan word sekerete means 'cigarette', the traditional word motsoko means 'tobacco'.Only a few respondents pointed out the difference in meaning, so this specific distracter failed.The third and last category represents an extreme case, as the distracter consists of two non-words, as can be seen from ( 12).
Dealing with two non-words, still half of the respondents opted for what they perceived to be an original word. 5At first glance this might question the reliability of the feedback of a huge number of the respondents.However, it is a clear confirmation of the respondents' general preference for indigenous words over loan words to be treated in a dictionary.Compare the following telling remark from one of the respondents in this regard: The word sephekgo is unknown to me.I chose it because I want to know what it means and I think that if it is included in the dictionary, most of the people who are like me and don't know anything about the word will know and understand it once it is explained in the dictionary.I therefore request that you include it.

General findings of the survey
The general findings of the survey will now be summarised.For the statistics presented below, the data for the randomly-interspersed distracters have been excluded.Firstly, one can calculate the percentage of respondents in favour of loan words as opposed to the respondents in favour of their (more) indigenous counterparts.The results are shown in ( 13).This graph clearly indicates that an overwhelming more than two-thirds majority of those who only accept one option are in favour of the indigenous word.The following remark by one of the respondents is quite revealing in this respect: I personally believe that our language will lose value if more and more words from other languages are accepted in Northern Sotho.If we have original Northern Sotho words, why do we have to loan words from other languages?I don't see the necessity for us to loan words from other languages if we have our own original words.Only those words which do not exist in our language could be loaned from other languages, for example AIDS.The Zulus have managed to formulate their own word for AIDS and called it nxolazi [sic], why can't we the Northern Sotho people do the same and stop loaning from other languages?
If one studies the overall pattern for 'loan + both' versus 'indigenous + both', then one arrives at the pattern shown in ( 14).
(14) Respondents' preference for 'loan words or both' vs. their '(more) indigenous counterparts or both' From (15) one can clearly see that males tend to prefer the inclusion of loan words in dictionaries more than females do, as they suggest 6.8% more of them on average.Thirdly, the above findings should be compared with the occurrence of loan words versus (more) indigenous counterparts in the corpus.The latter is done in ( 16).The Pretoria Sepedi Corpus (PSC) is based for 100% on written sources.Ten times more indigenous words appear in PSC than their loan counterparts.This is a significant and clear indication that indigenous words take overwhelming preference over loan words in written texts.If a dictionary were solely based on corpus data, chances would be good that loan words would turn out to be undertreated compared to the respondents' preferences.The difference between the respondents' preferences for loan words and the corpus attestations for those same loan words is as high as 20.4% (= 29.4 -9.0).In order for a Northern Sotho dictionary to reflect the true needs of the community, it is thus clear that this research reveals an important hiatus in an approach solely based on corpus data.The graph shown in (17) indicates that the nouns that were the topic of this study appear more often in their singular form than in their plural form in PSC.
(17) Singular vs. plural nouns in PSC This simply means that, as far as the microstructure of a Northern Sotho dictionary is concerned, the studied nouns should rather be treated and exemplified in the singular.A clear exception is of course malekere / dimonamonane 'sweets' discussed above.Fourthly, the respondents' input and the corpus data should also be compared with the overall treatment in the 9 currently-available dictionaries.The latter is summarised in (18).From ( 18) one can see that, on average, the 9 available dictionaries for Northern Sotho treat both loan words and their (more) indigenous counterparts on a par.In order to see whether or not this situation would suit future dictionary users, the next series of analyses needs to be presented first.Fifthly, and lastly, the sum of the user profiles should also be brought into account.As far as the profiles for rural versus urban birthplaces are concerned, the results are unfortunately inconclusive, probably as a result of the fact that most respondents commute frequently between, and live in, both places.This is an area which should be researched further.When the age profiles are summed, however, a clear pattern emerges.Indeed, the most significant finding of the survey is obtained when the respondents' answers are broken down according to age groups.In the shaded columns of (19) the respondents' feedback is divided into three major age groups, namely 16-21, 22-27 and 28-65 (each containing roughly the same number of respondents).On both sides of this shaded block the preferences for L versus I of the extreme age groups 16-17 and 48-65 are also indicated.ers of Northern Sotho as target users.At the same time this result renders justification for the current dictionary situation as indicated in (18) above, where an almost equal treatment of loan words versus indigenous counterparts was found.
A typical manifestation of this phenomenon could be the preference pattern for teye versus fofo, both meaning 'tea', where it is clear that most of the respondents who chose the indigenous fofo are aged between 40 and 53, whereas those who opted for the loan word teye belong to the younger generation (compare in this regard the findings of Slabbert and Finlayson 1999).

Conclusion
In this article, loan words versus their (more) indigenous counterparts in Northern Sotho were studied from a lexicographic perspective.Within a proscriptive approach to dictionary compilation, the selection of the main variant is based on the analysis of a sound empirical basis, whilst that empirical basis also provides the lesser-important items to be treated.It was shown that the bynow standard empirical basis consisting of data derived from an electronic corpus is not good enough for the treatment of loan words, and that fieldwork is imperative.The latter was achieved by means of a survey conducted among 100 mother-tongue speakers of Northern Sotho.
By studying the respondents' comments as a whole, it is clear that they prefer the (more) indigenous words to be treated in dictionaries, and that loan words should only be used if there is no good alternative in Northern Sotho.Quite a number of respondents even suggest that words should be coined in order to have a Northern Sotho word instead of an adoptive from other languages.Where offered a direct borrowing and a (Sothoised) loan as only options, the (Sothoised) loan is preferred to the direct borrowing.This thus suggests the following preference hierarchy: indigenous word → Sothoised loan word → direct borrowing.
The most important finding of this study is that younger respondents seem to accept loan words much more easily than the older generation.This might be a result of the intensified influence from other languages in both rural and urban areas, as well as a direct consequence of the fact that most teenagers are no longer enrolling for Northern Sotho as a school subject.Also, older people tend to favour so-called 'old words' that are no longer known by the youth of today.
All in all, however, first preference is still given to (more) indigenous words over loan words whenever there is a choice.This pattern is rapidly changing, and today's dictionaries should definitely pay more attention to loan words than the dictionaries compiled half a century ago.A watchful eye will have to be kept on this evolving preference pattern.
Nowadays, one often finds so-called politically correct phraseologies such as "primary language" and "home language" for "mother tongue".2.
We are aware of the fact that a great deal of linguistic research has been devoted to what linguists variously call 'loan words', 'borrowings', 'adoptives', 'transliterations', etc.In this article, however, the focus is on lexicography, and we only use those terms in contrast to their traditional/original counterparts.We refer to the latter as '(more) indigenous counterparts'.
The noun setimela, e.g., has a class prefix of class 7 se-, while terene has no class prefix.5.
The non-words forminal and sephekgo were 'created' in such a way that they resemble genuine English and Northern Sotho words respectively.Actually, forminal was derived from a permutation of sections of the English word 'informal'.Unfortunately, an Internet search with Google (http://www.google.com/)returns 56 hits, revealing that forminal is a technical neurology term used in for instance 'Forminal Stenosis of Cervical Spine' or ' Extra Forminal Non-discogenic Lumbar Nerve Entrapment as a Cause of Sciatica'.We can however safely assume that no respondent knew this technical sense.The non-word sephekgo seems to be a Northern Sotho noun belonging to class 7 se-.Fortunately, no Internet pages were found containing the non-word sephekgo.

( 13 )
Respondents' preference for loan words vs. their (more) indigenous counterparts between preferences of males versus females results -with the data shown in (14) as a point of departure -in the percentages shown in (15).(15)Males' and females' preference for 'loan words or both' vs. their '(more) indigenous counterparts or both'

( 16 )
Loan words vs. their (more) indigenous counterparts in PSC

( 18 )
Loan words vs. their (more) indigenous counterparts in 9 Northern Sotho dictionaries

Gender Male: 47 Female: 53 Age 16: 1 17: 2 18: 8 19: 6 20: 5 21: 6 22: 6 23: 7 24: 5 25: 7 26: 4 27: 7 28: 5 29: 8 30: 4 31: 2 32: 5 34: 1 35: 2 36: 1 37: 1 38: 1 48: 1 53: 1 54: 2 65: 2
Choose from A and B those words which, according to you, should be included in a Northern Sotho dictionary.You may choose either A or B, or both, and in C you may write any relevant comment, such as: any other word that you think is better than the two words already given in A and B, if you see a spelling mistake for the words given you may provide the correct spelling, or if you wish you may give reasons why you say the word you have chosen or given is the one to be included in the dictionary, or, conversely, you may give the reason why you say the other words should not be included in the dictionary, etc.

Go ya ka nna re swanetše re hlalose mantšu ao a šomišwago tšatši ka tšatši re se ke ra šomiša Sepedi sa kgale seo e le go [sic] gore bana ba matšatši a ba ka se se kwešiše, go swana le bo 'Pherekgong' [sic]."
).The plural form for Janaware is diJanaware and for Pherekgong it is diPherekgong, yet C does not contain any of the two plurals.16% of R agree on both L and I, while 4% suggest nothing.The preferred form by both R and C is thus L, but with spelling Janaware.D are therefore very wrong, as they mainly focus on I. Klein has nothing.T&O (p.23): "The names of the months of the year are Sothoised instead of using the Sotho names, e.g.: Janaware (instead of Pherekgong) ...".One of the respondents, aged 21, who is a student, female and a Northern Sotho speaker wrote: "("According to me we should only include those words that are used every day and avoid using old Sepedi words like Pherekgong, which children of today do not understand.")15% of the respondents say the correct spelling is Janaware (19/0) not Janeware (0/0) which means that the correct word to be included in D is indeed Janaware.Pherekgong may also be included, but with a note pointing out the potential confusion (January vs. March), and a cross-reference to Janaware.
Analysing the loan word survey -terene-setimela pair 19) Distribution across age groups of the respondents' preference for 'loan words or both' vs. the respondents' preference for their '(more) indigenous counterparts or both' -1.data