Challenges Encountered in the Compilation of an Advanced Shona Dictionary

This paper highlights the challenges encountered by the African Languages Lexical (ALLEX) Project (at present the African Languages Research Institute (ALRI)) in Harare, Zimbabwe, which is in the process of compiling an advanced Shona dictionary (ASD). Its forerunner is the general Shona dictionary, Durarnazwi ReChishona (1996). The ASD is intended to be a comprehensive reference work, which will serve as a resource for more advanced users, especially those at higher secondary and tertiary education levels. The most important challenges have been in the areas of headword selection and the treatment of geographical/individual variation. The matters discussed here show the conflict between usage, i.e. popular acceptance, and (orthographic) norm, a problem often experienced in young literary languages subject to heavy foreign influence. This paper looks at: (a) the limitations of the current Shona orthography, the selection and codification of international vocabulary, and the presentation of variants and synonyms in the dictionary, and (b) the solutions suggested, and/or the ongOing debate on the topics.


Introduction
This paper looks at some aspects of the work of the African Languages Lexical (ALLEX) Project l (at present the African Languages Research Institute (ALRI)) which is in the process of compiling an advanced Shona dictionary (ASD).Its forerunner is the general Shona dictionary, Duramazwi ReChishona (ORC) (1996).The ASD is to be a comprehensive reference work for more advanced users, especially those at higher secondary and tertiary education levels.Faced with this huge task, the ALLEX team has had to deal with areas that were either only slightly explored or not explored at all in the DRC.Consequently, a number of challenges have come up which need to be addressed before the ASD is completed.These have been the headword selection and the presenta~ tion of variants and synonyms in the dictionary.This paper highlights some of the problems encountered, the solutions suggested, and/or the ongoing debate on the topics.
. The questions touched on in this paper are not unique to Shona.They stress the conflict between usage and norm, a problem often experienced in young literary languages subject to heavy foreign influence.Much has already been written on theoretical debates and discussions on similar lexicographic questions in other languages (see e.g.Haugen 1976 andSinclair 1987).In this paper I present the practical problems that the team faced and the solutions that were suggested, without getting too deeply involved in theoretical issues and debates implied by the choices made.I do, however, refer to and draw par~ allels with the Scandinavian experience between the 16th and the 20th centuries.

Headword selection
Headword selection has had implications for.the current Shona orthography, as indicated below.

The problem
Working towards producing the ASD has shown that the absence of certain letters and/or digraphs, like I, th and rh in the standard Shona orthography, compared with what is found in the corpus and in everyday spoken Shona, points to a well-justified argument to include headwords containing such letters in the dictionary.Haugen (1976) reports that Scandinavian grammarians 1 d with these same questions back in the 17th century.Commenting onthe batt t e d debates that ensued between traditionalists and radicals, Holberg hE ea. e t I 415, cited in Haugen 1976: 395) says, "there are no wars as bitter as ( pIS e . " those fought by grammanans .
The question of orthography has been discussed in the general plarming d training meetings and workshops and certain recommendations have been :ade which need to be pursued.The following options have been explored: Avoid including headwords containing these letters and/or digraphs.Systematically replace I with r in headwords as is customary with adopted words.It has worked for e.g.bhora « ball), jirimu « film) and bhurara (name referring to a pesticide < Nguni bulala, to kill).
Delete I in adoptives as has occurred in sauti « salt) and musoja « soldier).Continue to ignore words with th and rh as has been happening already or systematically replace the th with t and rh with r.Include adopted headwords containing these letters with the letters intact, as is shown in the following indigenous and adopted examples in which I, th and rh occur both in the corpus and in everyday Shona: (i) Shona dialect, e.g.kudhla (standard Shona kudya) (to eat) and kutyla (standard Shona kutya) (to fear), (ii) Shona proper names, e.g.5ithole and Mlambo, and (iii) adoptives, e.g.kukala (to colour), kudhila (to deal), mudhi/a (a dealer), ndombolo (a Zairean dance), yelo (yellow), bhuluu (blue), losheni (lotion), kalenda (calendar), themomita (thermometer), bhethi (birth certificate), thiyeta (theatre), thiyori (theory), rhumba (type of music) and rhumha (a frightening thing).
A closer look at the occurrence of these items in the corpus leads to the realization that these words are now part of the everyday Shona language and that it would be unwise to continue to ignore their existence.Besides, there is a limit to which "foreign" letters can be replaced with local ones, as there is an aesthetic side to both written and spoken language that cannot be ignored.It should be pointed out also that dictionaries almost always seem to carry words that are not consistent with current orthography (see e.g.Kristu « Christ) in DRC).

Decision taken
Our position, therefore, was that while the standard orthography needs to be respected, the corpus needs to be respected as well, as it should not happen that community norms prevent one from hearing about themomita « thermometer) because its spelling has a th in it.The honest editor is true to the language as it is used.It may be that the team will be criticized for violating the integrity of the language by introducing strange letters and/ or digraphs to the language in what will be interpreted as a normative work.But, we also have to take other factors into account.Orthographic revisions have been made before and can still be made now.There have been three official orthographies for Shona so far, and we already know that dictionaries help with acceptability.It was therefore agreed that the following was to be done: (a) Include adopted headwords containing these letters with the letters intact (as Kristu was in DRe, even though it contains two consonant clusters which are prohibited by the standard orthography), and with etymological information included in the entry as well as an explanation in the front matter. (b) Present the problem to the speakers of the language.Explain that this is not a problem peculiar to Shona only but that other languages go through similar problems and describe how these communities handled the problem.The presentation could be in the form of a seminar or an advert in the newspapers that would solicit responses and suggestions.(c) Request the Shona Language Committee to recommend their use.One of the Committee's roles, among many others, are to examine and discuss language issues raised by users and make the necessary recommendations to the education ministry.•

International vocabulary
From the scope of headwords selected, the ASD will be a different dictionary from DRe.This will also be achieved through including international vocabulary (IV) in the dictionary.IV often refers to technical words which carry specific, unchanging and unambiguous senses in the contexts in which they occur and are used internationally.Most of these words are encountered in scientific and technical subjects taught in schools, where part of the mastery of each discipline entails the mastery of the concepts in it.If the ASD is to be promoted, it must serve as a practical tool for teachers who want to improve teaching in Zimbabwe by teaching and writing textbooks and other materials in Shona.Teachers would eventually be teaching these subjects in at least Shona and Ndebele and would, therefore, need a source to consult.

The problem
While lexicography entails a codifying function, lexicographers do not like to perform this role because it forces them to be more prescriptive than they desire.How then were we to determine the headwords in the present circumstances?While Shona is undoubtedly a very rich language in other areas, it has had to borrow a lot on matters that "relate to material culture and technology" (Chirnhundu 1983: 341).The questions that needed to be deliberated on were: Should we continue to adopt and adapt the international headwords from English and other foreign languages?Should we use indigenous headwords (where they exist) or should we coin new ones (where they do not exist)?Should the headwords be restricted to those that are actually being used or should the team go beyond these, and if so, how far beyond?
In all these deliberations we had to remember that consistency is central to scice and technology.While we would like things to be uncomplicated and re-e~ct IV headwords to those that are actually used, as lexicographers we still ~ave the responsibility to show the scientific and technical professions that indigenous languages can be used in these circumstances.Again, the Scandinavian developments of the 17th and 18th centuries shed light on this challenge.Ha~gen (1976) write~ tha~ ~e conc~rn of lexicographers and grammarians at that time was for the punty of theIr languages as they battled with foreign influence and the question of whether or not to coin new words, to revive dying words or to use foreign ones.While the general policy in the rest of Scandinavia was to promote indigenous words over adoptives as much as possible, Iceland's position was an outright "no loans from other languages" (Thorlaksson 1612).

Decision taken
We agreed that where good indigenous words existed for the technical/ scientific concepts that needed naming, it would be best to promote those words over adoptives.Shona already has a good tradition of naming in which it observes the behaviour of a thing/concept and then gives it a name that qualifies that observation.Our feeling was that this tradition should continue through new coinages, which would include new Shona words as well as semantic extensions of already existing ones.With coinages we had to face the possibility that speakers might not use them and would opt to stick to adopted forms with which they were more familiar.For this reason, we would only coin new words that we believed were likely to be easily accepted.
We made these decisions fully cOgnisant of arguments emphasizing the difficulties of using indigenous words and coinages.We were aware that the use of indigenous words and coinages could raise problems, since these were words that existed in the language and hence had meanings attached to them already.As a result, they could be ambiguous in some cases, e.g.denderedzigama (whose meaning we have extended to mean semicircle in geometry) which can be broken down as: denderedzi-+ -gama (circle) (not full) "Not full" does not necessarily mean "half full".This means that ambiguity may arise where the same word may be used to mean e.g. a quarter or half full.Another illustration comes from the terms for "High Court" and "Supreme Court" which have been developed in Shona as dare guru (lit.big court) and dare guru repamusoro (lit.big court of top) respectively.However, it can be difficult to differentiate their respective uses when both courts are sometimes referred to just as dare guru.Further ambiguity results as any higher court may also be called dare guru.Parliament is called dare guru (reParamende), when dare now refers to an assembly.Some users feel that the introduction of adoptives from English could diminish this potential for ambiguity.For example, koti « court) is regularly used and accepted by some as part of the Shona vocabulary.Also, a number of IV words are used in Shona speech, perhaps through code-mixing and code-switching, but they do riot appear in writing, e.g. the adoptives mainasi/mainazi « minus) which are still very often used, even though -bvisa/-bisa (minus/subtract) has been well-integrated (see Appendix A for more examples).Drawing on the Scandinavian example again will show that attempts to replace the Graeco-Latin terminology with indigenous equivalents succeeded only in creating occasional synonyms for the Graeco-Latin terms.Translators tried with varying degrees of success to find equivalents, many of which became standardised while the others survived as syno-nyms~ The team therefore decided to enter the parallel terms, so that the matter can resolve itself over time, when one term might gain favour above the other.
Our argument for the use of these indigenous terms is that IV words and technical terms do not just establish themselves.They come from an authoritative source.The dictionary should set the norm by conventionalizing the use of some of these IV terms.An example of a country that seems to have succeeded in implementing the use of indigenous terms is Iceland, where all school subjects are taught in Icelandic.The language policy there is to preserve the structure of Icelandic, thus eliminating any foreign influence.Government supports the structure-preserving policy strongly by funding full-time terminographers and providing facilities to ensure that the work is done successfully.Everyone has been sensitized to the policy, and they all strive to actively contribute to the development of Icelandic terminology.Ambiguity, however, continues to be a problem.
' On the other hand, the argument in favour of adoRting terms is that even those languages that are now said to be developed, have also borrowed terminology from others.For example, nearly all the morphemes used in IV words are Greek or Latin, though Shona speakers have encountered them through English.Consequently, in naming some of these scientific concepts we had to concede that while it would be preferable to use our own indigenous words, consistency across the world in the sciences is also needed.Therefore, adoptives would be selected where no indigenous words existed or where a coinage would run the risk of being rejected completely.
MorebJessings Busi Chitauro-Mawema treated as synonyms and cross-referenced to one another.In contractions and abbreviations the complete form will function as an explanation.
In the ASD we decided to use mainly the explicit type of cross-referenCing.Cross-referencing in the dictionary will, however, be complicated by the complexity of the Shona-speaking community where variation carries regional political connotations.The term Shona which came in use after Doke's publication of his report on the unification of the Shona dialects in 1931, is a collective term for a group of five main, mutually intelligible varieties.These are Zezuru, Karanga, Manyika, Ndau and Korekore.Standard Shona is mainly made up of Zezuruand Karanga, in that order.Doke's report (Chimhundu 1983) confirmed the zones/tribes (sic) established by missionaries, and made the Zezuru dialect, the dialect of the capital, the norm for phonemic analysis.As a result, Zezuru gained more prominence than the other dialects in usage and acceptance.The ministerial directive of 1982 which allowed dialectical variation in writing, tried to reverse this state of affairs."Controlled flexibility" was now to be allowed in spelling, thereby allowing dialectal variation.However, the fear expressed by some is that, despite the ministerial directive, some varieties may in practice continue to dominate at the expense of the others.Cross-referencing means that one commonly used form, presumably from the more dominant varieties, will be the main entry, while the less commonly used, most likely from the less dominant varieti'es, will continue to be considered of less weight.This means that varieties like Zezuru and Karanga will continue to enjoy the pres~ge that they have always enjoyed, compared to the other varieties like Manyika, Ndau and Korekore.According to language experts2 who have been in the teaching field for some time, Zezuru continues to enjoy the prestige that it has had since Doke (1931).They observe that firstly, Korekore is more or less Zezuru, and secondly, Manyika will tend to borrow more from Zezuru than Karanga, its nearest rival, because Zezuru has prestige, because of its historic association with the capital city, and also because in some ways it is easier to understand than Karanga.

Decision taken
While all these other factors may prevail, we as a team are still convinced that to save space, the explicit type of cross-referencing is the best.Besides, our dictionary is based on a fairly representative corpus in terms of regional coverage.Hence, where there is doubt, .theselection shall be made on the strength of the corpus evidence after all other factors have been considered (see the weti/mutundo illustration below for more clarification), The ASD presents four types of variation,and this is how they are treated (see also sample entries in Appendix B): . (Statistics indicating the frequencies of occurrences were extracted from the ALLEX Shona corpus of over 2 million running words.The form marked with an asterisk It is the main headword, the more commonly used term.)

Variants (a)
Regional variations There is a fair amount of variation among dialects, both with respect to the stem-changing derivational processes and phonological alternation, e.g.Alternate spellings and forms of adopted words may also result from individual/regional pronunciations of the adopted words, e.g.

"nzeve
bhazi/bhasi/bhesi « bus) shati/sheti « shirt) Besides the individual/regional variations, the older generation also seems to pronounce these adopted words differently from the younger generation.The older generation's pronunciation is more faithful to the spelling while the younger generation's is more faithful to the pronunciation of that word by the English-speaking community, d. shati/sheti.This situation seems to result from the way the language was acquired by these two generations.At the time the older generation acquired the language, they had more access to the printed than the spoken word, hence they pronounced words according to spelling, like the Shona.The younger generation, on the other hand, has had access to both.Furthermore, there was the Roman Catholic influence which pronounced / a/ in man, the, bank, etc.All these factors have led to a multiplicity of terms that will need to be provided for in the dictionary.Again, where there is doubt, .corpus evidence will detennine which form is to be the main entry.

Synonyms
Again, in the interest of saving space, synonyms should be defined only when it is absolutely necessary.Otherwise a less commonly used form would be cross-referenced to the more commonly used one carrying the definition.Where in doubt, the strength of the corpus would also help determine the main headword, e.g.
-svikirwa (be possessed) 106 -sutswa 0 However, as we discovered, the problem is not always easy to deal with, especially in cases where users feel strongly for their own form, e.g.

mhuka/mututu/muhotwe (nosebleeding)
In cases like this, the standing decision is to define each one and cross-reference to one another, space permitting.
The general trend throughout the dictionary would be to promote indigenous words as much as possible.What we observed~ however, is that where an indigenous and adopted headword compete, the indigenous word may not always tum out to be the one more frequently used.In such cases we may disregard frequency of use so that the indigenous word ~or both the indigenous and adopted word will carry the definition, examples, etc., e.g.In the mutundo/weti example, we witness a common feature of the language.nus is the use of euphemism for taboo words, especially those referring to excretive functions or sex.In this illustration the corpus has 23 occurrences of weti and 0 for mutundo, yet mutundo is the main word.This example illustrates why we have to be careful when taking frequencies of occurrence as the ultimate answer to our problem.Other cultural and social factors need to be considered as well.
Contractions and full forms will be treated as synonyms.Some dialects use the contracted form while others use the full form, e.g.chipotswa!chipotserwa(casting a spell).

Conclusion
This paper tried to highlight some challenges that face the ALLEX team as it works on its second monolingual Shona dictionary.The lesson that the team has learnt from the Scandinavian experience is that the development and cultivation of indigenous languages is a gradual task, which may take centuries.For the Scandinavians it was only after three to four centuries that the established languages started stabilizing.In the ALLEX team we are content, for now, in the knowledge that we have laid a foundation of codification and cultivation of Shona on which later generations can build.
The ALLEX Project has operated as a tripartite co-operative venture in monolinguallexicography under the UZ/NUFU Agreement between the University of Zimbabwe, the University of Oslo and the University of Gothenburg.

2.
This observation is drawn from consultations with Korekore and Manyika colleagues.