Compiling the First Monolingual Lusoga Dictionary

In this research article a study is made of the approach followed to compile the firstever monolingual dictionary for Lusoga. Lusoga is a Bantu language spoken in Uganda by slightly over two million people. Being an under-resourced language, the Lusoga orthography had to be designed, a grammar written, and a corpus built, before embarking on the compilation of the dictionary. This compilation was aimed at attaining an academic degree, hence requiring a rigorous research methodology. Firstly, the prevailing methods for compiling dictionaries were mainly practical and insufficient in explaining the theoretical linguistic basis for dictionary compilation. Since dictionaries are based on meaning, the theory of meaning was used to account for all linguistic data considered in dictionaries. However, meaning is considered at a very abstract level, far removed from the process of compiling dictionaries. Another theory, the theory of modularity, was used to bridge the gap between the theory of meaning and the compilation process. The modular theory explains how the different modules of a language contribute information to the different parts of the dictionary article or dictionary information in general. Secondly, the research also had to contend with the different approaches for analysing Bantu languages for Bantu and European audiences. A description of the Bantuand European-centred approaches to Bantu studies was undertaken in respect of (a) the classification of Lusoga words, and (b) the specification of their citations. As a result, Lusoga lexicography deviates from the prevailing Bantu classification and citation of nouns, adjectives and verbs in particular. The dictionary was tested on two separate occasions and all the feedback was considered in the compilation process. This article, then, gives an overall summary of all the steps involved in the compilation of the Eiwanika ly'Olusoga, i.e. the Monolingual Lusoga Dictionary.


1.
The status of Lusoga Lusoga (J16 in Guthrie's (1948) classification) is the third-largest language in Uganda with a population of 2 062 920 people, which corresponds to 8.6% of the total population (Uganda Bureau of Statistics 2005: 12).Lusoga is spoken in an area called Busoga, in the eastern part of Uganda.The sociolinguistic situation in Jinja, which is the main administrative town of Busoga, is multilingual.The historical linguistics of the Busoga region emanate from the 1944 Makerere Conference on Language which deliberated on languages of instruction in schools and decided that Luganda, Acoli, Runyoro, Ateso and Lugbara be the media of instruction in the entire country (Ladefoged et al. 1971: 87-89).
Luganda was chosen as the medium of instruction in Busoga.Although this policy was abandoned shortly after Uganda's independence in 1962, Luganda had already been established in the Busoga region as medium of instruction in lower primary school.The 1965 language policy left English as the inevitable lingua franca of Uganda and as the language used as medium of instruction from the seventh school year onwards (Ladefoged et al. 1971: 90).In June 2005, the parliament of Uganda passed the teaching of Lusoga as one of the nine regional indigenous Ugandan languages.Documentation of Lusoga was hence required.
Lusoga was categorized as an undocumented language because available literature in Lusoga was very substandard.Although attempts into the standardization of the Lusoga orthography were made by the Cultural Research Centre (CRC and LULANDA 2001) and the Lusoga Ecumenical Board (LEB 2000), both of the orthographies were inconsistent in their description of the Lusoga orthography and their coverage was very shallow.On the other hand, the only available grammars of Lusoga (CRC 2000 andBabyale 1999) had a pedestrian consideration of grammar with English translations for tourists.There were no operational Lusoga language boards to regulate the documentation of Lusoga.Interested Lusoga speakers embarked on writing what they thought fit with regard to the language, without addressing the structural composition of Lusoga.As a result, an overhaul analysis of the structure of Lusoga was necessary to provide a foundation for the compilation of the first monolingual dictionary of Lusoga.
A review of literature on cognate languages was used to discuss the structure of Lusoga, on which the decisions in the compilation process were based.The cognate Bantu languages selected were Zulu (S42), Shona (S10), Swahili (G41, G42, G43), and Luganda (J15).A comparative analysis of the structure of the Bantu languages was used as the basis for specifying the Lusoga orthography, grammar and lexicography.The resulting dissertation, Nabirye (2008), addressed Bantu lexicography in general and Lusoga lexicography in particular.The different steps taken in the compilation of the monolingual Lusoga dictionary itself -Eiwanika ly'Olusoga, Nabirye (2009), henceforth WSG -were arranged in chapters to explain the process of the compilation of the dictionary as an academic study.

2.
A review of some proposed frameworks for compiling dictionaries Lexicography has been alienated from the study of linguistics, so much so that scholars such as Hartmann (2001: 111-112) doubt whether a lexicographical process qualifies as a research study.Piotrowski (1994: 5-8) thinks that perhaps lexicography is not a branch of linguistics but a discipline of its own.Pawley (1985: 99) believes lexicography is not conducted according to stipulated theoretical principles, and that linguists turn into lexicographers at different points in lexicographical research.As a result, Piotrowski (1994: 8) concludes that lexicography is a complex field, and that "a proper approach to its theory is to evolve a flexible framework which could include as many different approaches as possible".The same or similar opinions on lexicography are shared in studies like Wiegand (1984), Hausmann (1986) and Zgusta (1986).
The compilation of a dictionary for academic purposes, from this background, was therefore very challenging because a theoretical framework to explain the compilation of dictionaries was required.Practical considerations, which provided the only existing framework for compiling dictionaries, were insufficient in accounting for all the linguistic data considered in dictionaries.The existing methodological frameworks for compiling dictionaries lacked a continuum to the intrinsic theoretical explanations of the foundations for the compilation of dictionaries.
For example, a methodology-based lexicographical study like Van Keymeulen (2003) concentrates on the practical aspects of data collection and users are referred to the relevant handbooks for the theoretical background.Since language references exist and a minimum level of language proficiency is assumed from the users of the target language, this methodology starts at a higher level in the compilation process than that required for the compilation of the WSG.
As a second example, the methodology introduced in De Schryver (1999) does not specify the theoretical basis of the different activities in the compilation process.It focuses more on the practical considerations for compiling a dictionary based on corpus analysis.This methodology is realized at an even higher level of dictionary compilation than that of Van Keymeulen (2003) and is thus even further removed from the methodology used in the compilation of the WSG.
For instance, the De Schryver (1999) framework -a corpus-based approach to the compilation of paper dictionaries within the framework of Simultaneous Feedback (SF) -was inapplicable to the Lusoga dictionary compilation process since corpus analysis of Lusoga was not considered beyond the compilation of a Lusoga word list.Applying De Schryver's framework to the Lusoga compilation context would have meant postponing the application of steps 2, 4, 6 and 8 of the SF framework until such a time when some of the required parameters involved in the compilation process had been synchronized.Though simultaneous feedback can be applied at the testing stage, dictionary testing is also considered at an advanced stage in the dictionary compilation process, the reason being that prior to testing, a theory and the hypothesis on which the testing feedback is based should be specified and questions eliciting feedback should be geared towards testing this hypothesis.However, since the theoretical foundation of the simultaneous framework is not specified, the compilation of the WSG had no justifiable foundation to support the selection of questions for a questionnaire or even to critique the nature of feedback.
Notably though, both Van Keymeulen (2003) and De Schryver (1999) have made a big contribution to the methodology of compiling dictionaries.What was missing was the specification of how comprehensive the application of the proposed methodologies were to different dictionary compilation contexts.For instance, Van Keymeulen is geared towards the compilation of undocumented languages.Lusoga was categorized as an undocumented language but the contexts in which the dictionaries in the two contexts were compiled is not the same.Lusoga lacks the required language references available in the case of Van Keymeulen, and none of the speakers has ever been taught the language -in most cases Lusoga speakers have only used the language orally, but have never read or written it.Everything about the language and its analysis was new to the target user of the WSG.De Schryver, on the other hand, was geared towards the compilation of a bilingual Cilubà-Dutch dictionary.A bilingual audience has different characteristics from a monolingual audience.Though Cilubà may be a Bantu language like Lusoga, the way data is structured will vary depending on the target user.The context in which the Cilubà-Dutch dic-tionary was compiled thus does not address the same context as the one in which the WSG was compiled.If the bilingual Cilubà-Dutch dictionary had instead been a monolingual dictionary compiled for Cilubà speakers, then De Schryver would likely have proposed a more comparable methodology to that of the WSG.
The main misgiving noted is that the methodologies above mainly looked at the framework from specific contexts whose conditions ended up being restrictive.A context-free and generally applicable framework for compiling dictionaries was found missing.There was a need for some sort of theory to account for language data to be put in dictionaries regardless of the context or type of dictionary compiled or the ultimate dictionary user.
The methodologies provided in the discussion so far help to show two processes in the compilation process.Firstly, that the compilation process can start from theoretical considerations, bypass corpus considerations, and go on to compile a dictionary especially for undocumented languages.Secondly, that a dictionary can be compiled starting from a theoretical framework, move on to corpus analysis, and end in a final product especially for relatively well-documented languages.The gap that remains is the specification of the foundation to the entire dictionary compilation process.This article therefore aims to explain this foundation by showing how theory and practice were merged in the compilation of the WSG.

Presentation of Nabirye's (2008) framework for compiling dictionaries
In support of Piotrowski (1994: 8) no single theory is able to account for the entire dictionary compilation exercise.What a theory for compiling dictionaries entails, therefore, is a series of theoretical and conceptual road maps to guide lexicographers from the uncovering of the smallest bit of meaningful language data to its rendering into a dictionary.The compilation framework helps in the decisions on how such information can be interpreted and placed in a dictionary so that it serves both the purpose of the dictionary and the intended audience adequately.At any point in the process, the lexicographer should be enabled to examine the language from the most abstract meaning formations that string together the grammar or the lexicon of the language.A swift guide from theory to practice and back is necessary, particularly for a lexicographer compiling a dictionary of a less-or undocumented language, who may also have to establish the description of a language for the first time.
Lexicographers in this context are not only compiling the first dictionaries of a language, they are also specifying the structure of the language on which the dictionary is based.Hence, the main question to ask is: What is the most logical place to start the investigation of an undocumented language?The answer lies in the study of meaning and therefore the theory for compiling dictionaries should also start with the theory of meaning.

3.1
The theory of meaning The hypothesis drawn from the discussion on the study of meaning in Ogden and Richards (1923: 110-112), Lyons (1977: 27-29), Hurford and Hearsley (1983: 91), and O'Grady, Dobrovolsky and Katamba (1996: 276) states that the study of meaning can be described based on four premises, namely: (a) That the properties of a language are specified and defined.
(b) That the nature of words and their relations are established to provide a foundation for the interpretation of their senses.
(c) That all observable characteristics in the speech acts are analysed to contextualize the observable utterances that users of a language are likely to make and what they could mean in each case.
(d) That language in specific contexts should also be analysed to contextualize the different forms of usage that good use of words depends on and also explain how the same words in different contexts can produce different meanings.
If all these premises are analysed then the meaning of the different parts of the language and its properties can be established.The composition of the abovenamed hypothetical variables enables the understanding of the different meaning categories of an entire language.A lexicographer should thus be enabled to tap into the foundations of the meaning existing in each of the variables in order to get data for a dictionary.The lexicographer then endeavours to figure out how all the meaningful parts of a language ought to be represented in a dictionary.At this level, the analysis of meaning is however relatively abstract and lexicographers need guidance on how to retrieve information of a language from the theory and to render it into a dictionary.Another theory that classifies meaningful language properties into broader linguistic categories that are the basis for studying linguistics is thus required to bridge the gap between the theory of meaning and the activities of compiling a dictionary.This theory, the modular theory, looks at the different levels of linguistic analysis and the specific information at each level.

The modular theory
This theory considers interdependent modules or levels of language that have access to information in other modules in order to account for a full analysis of a language.The interconnectivity of the different modules of language analysis helps to specify dictionary information on which lexicographical decisions can be applied.For example, the phonological module provides sounds and their meaning, the morphological module provides the word structure, the semantic module provides the meaning of words to be defined, the syntactical module provides sentence patterns, the graphology module provides the graphical representation of all linguistic data considered, which in total display the structure of a language.The modularity of language is analysed to specify the type of information each module or level of linguistic analysis can contribute to a dictionary.The selection of information to consider in a dictionary depends on the type of dictionary a lexicographer would like to compile.A lexicographer compiling a bilingual dictionary will choose different data from the modules when compared to someone compiling a monolingual dictionary, for example.However, the general language repository for all types of dictionaries is the same regardless of the context of compiling.
Svensén (1993: 4-5) divides dictionary information into five categories, namely: formal categories concerned with spelling, pronunciation and inflection, combinational categories mainly based on the morphosyntactic nature of a language, a semantic category dealing with the nature of words and their relationships, encyclopaedic and pragmatic categories providing non-linguistic information such as verbal encyclopaedic information and pictorial illustrations, and lastly the historical perspective category catering for information such as etymology (i.e.word origin in relation to the time axis, which cuts across most of the categories, incorporating elements from several of them).
If the data categories above give a comprehensive representation of dictionary data, then at this level in the compilation process the lexicographer is able to sort this data to match the type of dictionary to be compiled.However, in order for the information to be appropriately selected, entered and organized in a dictionary, principles of compiling also have to be introduced into the framework to guide the lexicographers' decisions.The third level of the framework is therefore the consideration of the principles of lexicography which guide the general formatting of the dictionary.

Principles of lexicography
Principles of lexicography are practical in nature because they guide the actual compilation activities by reminding the lexicographer on what to look out for in the process of compiling a dictionary.The practical methodology for compiling dictionaries therefore considers the following variables: the type of dictionary to be compiled, optimization methods used to select dictionary information, citation forms of word classes, defining methods, and the organization of the dictionary (with regard to parts of the dictionary and each dictionary article).When all of the above steps are completed, then a study of how the dictionary is received by target users may be undertaken, reflecting how each of the variables in the compilation process was carried out.
In the process of determining the type of dictionary, data to be entered is selected as well.Optimization methods are applied to data already available and formatted by the optimization techniques a lexicographer decides to employ while choosing data for the dictionary and its representation.Data is then grouped into the different grammatical and/or lexical categories and entered in a unified format throughout the dictionary.Words are defined after deciding on their citation forms.These definitions reflect the linguistic characteristics in each grammatical category throughout the dictionary.
The organization of dictionary data is also essential to the compilation process.The determination of the structure of each dictionary article and the general format of the dictionary is needed.
If all the above steps are systematically followed, a dictionary should be deemed well compiled.It is thereafter left to the users and reviewers to determine how useful that dictionary is and how easy it is to access.This may also be determined through testing the compiled dictionary with the target readership.
A dictionary that undertakes adequate coverage for the intended user and is easy to access should be well received.However if it fails to meet these objectives, then it is concluded that it has not adequately satisfied its intended objectives.A summary of how the entire dictionary process is conducted using Nabirye's (2008) theoretical framework is shown in Figure 1.

4.
Application of Nabirye's (2008) framework in the compilation of the WSG

Introduction
The summary above is a presentation of the application of the entire framework in the different compilation stages.Other frameworks (Van Keymeulen 2003 andDe Schryver 1999) are also considered in this presentation.The compilation process of the WSG was divided into four stages, namely: preparation, word collection, compilation, and testing.The present section focuses on how the framework is used in the compilation of dictionaries in general.
The first stage is the preparation for the compilation, which is mainly library research aimed at equipping the lexicographer with a good understanding of the language of compilation.The formulation and interpretation of all the theoretical considerations is undertaken at this stage to form hypotheses to test later on in the compilation process.This would be the best time to apply Nabirye's (2008) framework.
The second stage is aimed at collecting words for the dictionary.Compilation of dictionaries in undocumented languages utilize this stage looking for words, recordings, speeches and any other available informal sources of words for the dictionary.This would be the best time to consider Van Keymeulen's (2003) framework for compiling dictionaries in undocumented languages.Lexicographers compiling dictionaries in documented languages, however, mainly utilize this stage to explore corpus analytical methods to collect words and their meanings.For documented languages, then, this is the most appropriate time to consider De Schryver's (1999) framework for compiling dictionaries.
The third stage, which is the compilation itself, merges all the steps of the framework depending on the particular data to be considered in the dictionary.If the meaning of a particular language element is not clear then a semantic analysis can be used and a testing measure could be improvised or a lexicographical decision could be employed to assist in the final rendering of such information in the dictionary.
The last stage is the testing of the dictionary.As stated earlier, the testing instrument has also to be grounded in the theory.Diverging views and any new descriptions of a language or changes in the compilation style are tested, based on the theoretical considerations stipulated in the preparation stage and applied during compilation to generate the simultaneous feedback advocated by De Schryver (1999) while the dictionary is being compiled.

4.2
Preparing the compilation of the WSG Since Lusoga was not taught or examined in the education system in Uganda, there was no authority on the language.Documentation of Lusoga did not start until a few individuals and the Catholic Diocese in Jinja took the initiative to write what they thought was a description of Lusoga mainly to fill the void.The documentation conducted in this context was not regulated and the literature accrued was mostly based on personal convictions.At the inception of this research, visits were arranged to these individuals and centres hoping to find mentors, literature informants and editors.On approaching the writers of the only existing Lusoga books, however, it was soon discovered that since they were not linguists they could not account for the linguistic decisions taken in the writing of their books.They were very reluctant to respond to questions about the linguistic descriptions of Lusoga because they had no prior formal training on the structure of the language.Meetings with Basoga personalities were a little more helpful and were mainly constituted of elders, members of the Busoga royal government, the clergy and journalists.Though they showed interest in the research, they had only used Lusoga orally and had not written or read it in their entire lives.Most importantly, these personalities were also not linguists and detailed academic research either interfered with their daily duties or was burdening their normal lives.Notably, these personalities were also intimidated by their ignorance of their own language, this on the most basic levels of the analysis of the nature of words, the spelling of words, and the rules of writing.There was no reference for this information in or on Lusoga, which had to be specified for the dictionary.Unsurprisingly, it was almost impossible to engage these personalities to complete some tasks on time, if at all.Evidently, their anticipated role was overrated, the realization of which led to the reduction of their contribution to the research process.They were only maintained as informants, especially during the testing process.The Cultural Research Centre (CRC) was adopted as the outlet operation centre for meeting informants in order to sustain their involvement in the compilation of the WSG.
The problem of devising the writing system of Lusoga on which to base the compilation was therefore the sole responsibility of the researcher.In order to address the Lusoga literature gap, the study was broadened and instead of focusing on Lusoga lexicography, a wider scope was adopted to focus on Bantu lexicography in the eastern and southern regions of Africa.The major Bantu languages in these regions, namely Luganda (Uganda), Swahili (Kenya and Tanzania), Zulu (South Africa), and Shona (Zimbabwe), were chosen as languages of reference.Fortunately, all the selected cognate languages were relatively well-documented.Therefore the theory of meaning was employed to assess the descriptions of the different aspects in the structure of Bantu languages, and comparative findings would then be tested for adoption, or be redefined or relegated.Application of the findings depended on the nature of the information and its relation to the nature of Lusoga as it is used by the native speakers.

Collecting words
The search for written material in or on Lusoga uncovered a bilingual word list 1 entitled Dictionary Lusoga-English/English-Lusoga (CRC 2000a); and two attempts at the description of the Lusoga orthography, namely one by the Lusoga Ecumenical Board (LEB 2000), 2 and one by CRC and LULANDA (2001).Two versions of a Lusoga grammar also exist, one written by Babyale 3 (1999) and another by CRC (2000).The CRC had some story books which were procured (CRC 1999(CRC , 1999a(CRC , 1999b(CRC , 1999c(CRC , 2000b)).The literature was further enriched with the Lusoga version of the New Testament (The Bible Society 1998) and Lusoga newspaper clippings of now defunct publishers, circulation of which ran for two years only (Kodh'eyo 1998-1999, and Ndiwulira 1998-1999).This was the literature from which words were collected for the compilation of the WSG.Computational linguistic studies were not available at the host university.Literature on corpora was found at other Bantu lexicography research centres.A visit to the Centre for Kiswahili Research (Taasisi ya Uchunguzi wa Kiswahili, TUKI) at the University of Dar es Salaam in June 2005 provided an insight into Swahili lexicography and corpus studies.However, none of the Swahili dictionary compilers employed corpus analysis and there was no corpus bank for Swahili at TUKI.Therefore, TUKI was not able to guide the study on the corpus analysis of Lusoga.The next research visit was to the Department of African Languages at the University of Pretoria in August 2005.This visit mainly served the purpose of providing literature on Shona and Zulu lexicography but none at all on corpus analysis.The research was consequently narrowed down and instead of heading into a full corpus analysis of Lusoga, only methods of collecting words were sought.
All the mentioned texts were scanned and used to generate word lists using Shoebox. 4This is as far as the corpus analysis for the WSG could be taken.For this reason the WSG compilation was categorized as belonging to the non corpus-based category.
In order to fill the void left by the limited references on Lusoga and the lack of an analysis of the Lusoga corpus, application of the theory of meaning was utilized to specify the Lusoga lexicon and grammatical categories which guided the specification of the lemmas considered in the WSG.The theory of meaning also helped to provide new evidence for the specification of Lusoga word categories and their respective citation forms, a detailed discussion of which is given in Section 4.4.2.

Compilation stage
This section will draw only one example from the compilation process which was considered to be the most controversial, forming the basis for the WSG's deviation from the existing citations of Bantu words in dictionaries. 5The discussion in this section is therefore centred on a reconstruction of the nature of Bantu words from a Bantu language speaker's perspective (Bantu-centred approach) as opposed to the way the same words are viewed by European Bantu linguists (Euro-centred approach).A distinction between the two perspectives on Bantu lexicography is discussed as basis for the classification and citation of verbs, nouns and adjectives introduced in the WSG.The following discussion presents a relatively new approach to the perceptions of the description of Bantu studies for Bantu users in particular.

Background to the citation of Bantu words in dictionaries
The compilation of dictionaries in Africa was initiated by missionaries for the benefit of missionary governments.The description of the Bantu languages was primarily meant to give European audiences a good understanding of the Bantu languages.English in this context was used as the language of reference and the standard for describing Bantu languages.Studies of this nature are categorized as Euro-centred.Unfortunately, the need to satisfy the European audience compromises the full understanding of Bantu studies from a Bantu perspective and these studies are therefore not Bantu-centred.This was hypothesized as the reason why most monolingual Bantu dictionaries reviewed in Nabirye (2008) were not suited for Bantu audiences, being difficult to access.For instance, there was no uniform citation of Bantu words in dictionaries.Bantu reference works such as TUKI (1981), Murphy (1972), Snoxall (1967), Hannan (1959), Blackledge and Kitching (1925), and Steere (1870) used varying citation forms for verbs as shown in the selected examples below.Steere (1870) sema ku-, to say, to talk, to speak … zika ku-, to bury … Blackledge and Kitching (1925) -agula, v. tr., scratch -sesa v.c., cause to laugh Snoxall (1967) ku-menya v. tr.… break … ku-duma v. i. … thunder … TUKI (1981) l.a kt … 'to eat' lal.a kt … 'to sleep' It was not clear which citation method best served Bantu dictionary audiences.Apparently the different citation forms were based on different interpretations of Bantu word classes; therefore the problem lay in the interpretation of Bantu morphology.Different morphological interpretations gave rise to varying cita-tions because the Euro-centred descriptions of Bantu word classes were inadequate to give a full analysis of Bantu morphology.Compare in this regard Zgusta (1971: 224): the posterior excerption of information frequently compels the lexicographer to change the whole construction of some entries, or shows him that there are some direct senses of some lexical units in the sphere of general language […] which he did not know, then probably the material on which the lexicographer based his construction of entries was not ripe enough, was not sufficient.
Consequently, a new approach to the understanding of Bantu morphology was required to rectify the problem of varying citations in Bantu lexicography.
Citing Bantu verbs and adjectives by their stems is quite contrary to a native speaker's natural usage of words because words are used in full word forms. 6Some scholars, for example Kiango (2000: 29) and Hannan (1959: viii), cite as reason for the development of the stem tradition, an attempt to avoid an imbalance in the alphabet.Van Wyk (1995: 94) argues that no language seems to have a balanced distribution of entries across the alphabet, in the sense that each letter in the 26-letter-alphabet has approximately 4% of all entries. 7Van Wyk raises the question whether the alphabet imbalance should actually be felt as an obstacle since there are options that could be considered to assist users in overcoming the problem.Van Wyk's statistical interpretation of the treatment of the alphabet balancing helped to dispel the constrained concern for the alphabet because it not only compromises the tone of the lexicographer to being prescriptive, it also does great harm to the language because linguistic information that could result from the natural observation of the language is blocked.

Application of Nabirye's (2008) framework in addressing Bantu citations
In order to understand the composition of verbs and the reasons for their citation as stems, the theory of meaning was used to specify the meaning and function of the noun class prefix in forming Bantu verb infinitives.According to Kiingi (2007Kiingi ( , 2007a) ) Bantu verbs are built around roots and are bound forms with the exception of the verb -li which can stand on its own.Because verbal prefixes and suffixes are not primarily part of the verb, an investigation into the meaning and role of these verbal affixes had to be undertaken.This investigation was carried out by analysing the marking of mood and infinitives in languages.English translations of Bantu verbs in the infinitive always take the preposition "to".The understanding of Bantu infinitives therefore started with an investigation of the meaning and function of "to" in the above-mentioned contexts.According to Duffley (1992: 141) "to" is not, strictly speaking, part of the infinitive, but it is a dematerialized preposition whose use is called for in certain contexts because of the meaning it expresses.Duffley (1992: 142) describes infinitive as a representation of an event as a whole and says that "to" serves a purpose of contextualizing the element of time situated outside its event."To" is used with infinitives both for the lexical and grammatical meaning it brings into context.Therefore "to" is not a verb; it only functions with verbs in specific contexts for specific purposes.
English dictionaries do not include "to" in their verb entries.Bantu lexicographers have over a period of time been trying to import this aspect into Bantu lexicography by omitting or not considering the prefix oku in verb entries with varying justifications.The problem is that English and Bantu languages have different structures, and what applies in one may not necessarily work in another.The starting point of the analysis of the Bantu languages was from the English perspective, and this perspective was imported into Bantu languages because it is a logical system that can be understood by the European audience of Bantu studies.The ignored disparity in language systems has consequently obscured specification of the most appropriate lemmatization of Bantu words in dictionaries meant for Bantu audiences.
In the analysis of the verb ending vowel, Palmer (1986) was used to analyse the marking of mood and modality in languages.Palmer (1986: 1) introduces two assumptions regarding the study of modality: (a) it is possible to recognize a grammatical category of modality which is similar to aspect, tense, number and gender; and (b) this category can be identified, described and compared across a number of different and unrelated languages.What is less obvious, however, is the characterization of the semantic function of modality.Palmer (1986: 2) notes that modality does not only or primarily relate semantically to the verb but to the whole sentence.For this reason, there are languages in which modality is marked elsewhere than on the verb or within a verbal complex.
According to Thrane (1983: 155) identification of grammatical categories across languages rests upon shared semantic characteristics called cross-linguistic equivalent classes.The identification of a typological category is consequently in terms of meaning.There may not be a precise definition but there appears to be some very basic or "prototypical" feature similar for all languages.Palmer (1986: 4) also notes that the real problem with the study of modality is not just that there is great variation in meaning across languages, but that there is no clear basic feature.There is a large degree of arbitrariness in the choice of grammatical form in the sense that it is not directly determined by meaning.Even with more easily definable categories such as tense and number, there is a very considerable difference in the extent of grammaticalization in different languages.There are even some languages that do not grammaticalize these familiar categories at all (Palmer 1986: 5).Palmer (1986: 6) concludes that it is difficult to decide what to include and what to exclude from a grammatical study of modality.He insists on grammatical relates because they enable the investigator to look at the languages themselves and see what is systematized and organized within their grammatical systems.Palmer (1986: 21) gives a clue when he claims that the term "'mood' is traditionally restricted to a category expressed in verbal morphology".He quotes Jespersen (1924: 373) who also insists that "it is a syntactic not a notational category which is in the form of the verb, and dictionary definitions usually refer to verbal inflections".Nabirye (2008) therefore had to re-examine the full word forms of Lusoga, finding that the infinitive full word forms of Bantu verbs consist of the noun class prefix 15 (oku), the root, and mood marked on the verb final; in this case the verb ending vowel as shown below: Both the prefix and suffix have to be present for Bantu verbs to be rendered as infinitives.Omission of either of the two would not create infinitive environments, this being why, without the prefix, stem entries are in the imperative mood.The prefix is a grammatical feature of gender and the verb final is that of mood.The noun prefix functions with the mood in the verb suffix to provide a timeless context for the verb root.This is the grammaticalized function for both the prefix and suffix in forming Bantu verb infinitives.The revised morphology of Lusoga words in the WSG was therefore treated as explained below.The verb pre-prefix which is used occasionally but is required for the citation of full word forms was placed in parenthesis.The different morphological parts of words which users failed to demarcate during the testing process 8 were segmented to guide users on Lusoga morphology.A new classification of Lusoga verbs was thus used to represent a word category with both noun and verb qualities.These verbs were moreover lemmatized under the first letter of the pre-prefix, namely "o".
Furthermore, full-form adjectives, which were cited with the noun class prefix 14 (obu), were classified as nouns, while words functioning as adjectives in their full form without the addition of noun prefixes were classified as the true adjectives of Lusoga.Dictionary users were guided on all these features in the front matter of the dictionary.

Testing phase
Testing of each of the theoretical considerations in the compilation process was conducted during the compilation exercise.Questions for testing were based on the practical considerations in the third section of the framework.Each of the activities was allocated a minimum of four questions and an overall evaluation was based on the average tally of percentages received from each section.The WSG was physically tested twice and the dictionary was very well received in the Lusoga-speaking community.This testing validated the finding of the research process and resulted in a deviation in the citation of Lusoga words from those in the prevailing Bantu dictionaries.Examples of how Lusoga words were entered are given in the examples in the Addendum.

Conclusions from the application of Nabirye's (2008) framework
In the absence of a theory of meaning and modularity, the specification of the composition of the parts of full verbs and adjective forms would have lacked a valid foundation and a relation to the compilation process.The available practical methodologies had not specified how the theoretical lexicographical decisions could be incorporated in the compilation exercise.A combination of both the theoretical and practical considerations in the dictionary compilation process is therefore necessary.The framework presented in this article can go a long way in furthering the re-examination of more descriptions of African languages, and it could be that new perspectives on Bantu language descriptions could be discovered, based on the Bantu-centred approach for the benefit of Bantu audiences especially.

1.
The Dictionary Lusoga-English/English-Lusoga (CRC 2000a) was found befitting the category of a 'word list' and is referred to as such in this article in spite of having 'dictionary' in its title.

2.
The LEB (2000) was received with missing pages, with no details of authors or publisher.

3.
Though Babyale (1999) is mentioned here, it was not available for review during the research process and was not used in the collection of words either.

4.
Shoebox is a dictionary formatting program used by Nabirye (2008Nabirye ( , 2009) ) to generate word lists for the WSG.

6.
The usage of full word forms was tested during the compilation process and was validated by the findings.

7.
For an in-depth analysis of the distribution of the alphabetical categories in dictionaries, see De Schryver (2005).

8.
Testing on the demarcation of word parts produced 100% failure, as all the respondents were unable to demarcate parts of words correctly, thus prompting the segmentation of entries in the WSG.-onka [-onká] tbk: -enká.kgz.