Lexicography in West Africa: Preparing a Bilingual Kisi-English Dictionary

This paper presents some of the issues involved in preparing a bilingual dictionary for Kisi, an underdocumented language spoken in West Africa. Because the language possesses little in the way of literacy materials, fundamental issues as to orthography, word division, etc., had to be considered. In addition, no grammar of the language (or its closest congeners) was available and thus basic grammatical analysis had to be performed simultaneously. I briefly consider some of these problems, discussing the use of the lexical data base programs known as LEXW ARE. I then focus on the specific problems raised by the expressive word class known to Africanists as ideophones. The conclusion, in the form of advice to future lexicographers of such languages, is that before undertaking such an endeavour, one must seriously assess its feasibility.


Introduction
In West Africa many languages possess little in the way of lexicographic materials, in fact, little in the way of any written materials!. This is particuLarly true of the less widely spoken languages belonging to the Atlantic Group of Niger-Congo, the latter being the phylum containing two-thirds of Africa's languages (Bendor-SamueI1989). The position of Atlantic within Niger-Congo is shown in (1). Atlantic languages are only distantly related to the Bantu languages of southern Africa, all of which belong to Benue-Congo.

Niger-Congo
Atlantic-Congo Volta-Congo I . The Atlantic language on which I focus is Kisi, a member of Atlantic's Southern Branch, spoken primarily in the Republic of Guinea but also in Sierra Leone and Liberia. The display in (2) shows Kisi and its closest relatives.
(2) The position ofKisi within Atlantic (Childs 1993b) I. Only a few Atlantic languages possess anything in the way of written materials. The exceptions are the more widely spoken languages such as Fula, Wolof, and Temne. For the less widely spoken varieties (some of which may be "dying" or already "dead", e.g. I Keim of the Southern Branch (Heine 1970: 145»1 written materials, if they exist, are rudimentary. They have been assembled by individuals with little linguistic much less lexicographic training, compiled primarily to assist nonnative speakers in learning a language or in translating religious works and educational materials 3 • This paper presents some of the problems involved in developing a dictionary for such a language. The first point that needs to be made is that compiling a dictionary is a monumental task in terms of both time and resources. It should not be undertaken without some foreknowledge, even with all the new software and miniaturized hardware to assist in the task. Nonetheless, lexicography is a field well worth exploring, particularly in terms of probing a culture and discovering the mechanics of a language's operation. From a linguistic perspective one learns the centrality of the lexicon to a language and confronts the incredible variation that forms a part of all speakers' competence4.

Northern Branch
Crucial to any lexicographic undertaking is identifying the audience, for there are many forms a dictionary can take. Besides the monolingual vs. bilingual distinction, a dictionary can, for example, appeal strictly to the linguist. An example of such a h'lrk is A !X6i'J Dictionary, in which the entries are arranged phonologically because "the Dictionary was initially compiled as a resource for Khoisan phonologists and comparative linguists" (Traill 1993: 3). Entries are ordered by phonological classes rather than alphabetically: clicks precede non-click StOPSI whkh precede fricatives, etc. The sort of dictionary I set out to compile aimed at a English-speaking mostly non-Kisi audience, although literate Isis would certainly find the work of use.
Concomitantly with compiling a dictionary, any lexicographer must be involved with issues in language standardization (e.g" Fasold 1984)1 often more of a political than a linguistic activity (e.g" Eastman 1992). Other written materials exist, featuring more than three different orthographies (the form of which could at least be partially predicted on the basis of knowing who the colonizer was (d. Awak 1992)5). However, none of these orthographies had been extensively supported (cf. Heine 1992 for a different report on the Kisi spoken in Guinea) and thus established conventions could be balanced against linguistic considerations. The dictionary was allowed to use a roughly phonemic orthography.
After a brief summary of the dictionary's history, I tum to some of the more technical aspects, concentrating in some detail on ideophones, an aspect that provides all Niger-Congo lexicographers with challenges and one that has not been satisfactorily resolved to date.

Project background
In 1979 I began to study Kisi as a linguistic object, having already learned to speak it in a sort of functional way some years before. Fieldwork began in 1983 when I studied the linguistic importance of Kisi for the pidgins used by Kisi people in the area where Guinea, Liberia, and Sierra Leone come together. An awareness of my ignorance of Kisi grammar recommended studying Kisi itself. Because of the difficulty in securing support for work on "minor" languages6, however, since then I have been able to work on the dictionary in only a desultory manner. More discouraging is that the grammar and dictionary need checking with native speakers, a project which can be carried out only with some change in the political situation for the past several years have seen the Kisi area of Liberia racked with civil war. I give this background information to inform the reader of the considerable persistence that is required to complete a project of this nature ' . Such an undertaking should be begun only with a realistic assessment of the pragmatic aspects. The technical aspects of preparing a dictionary for a little known language, to which I now tum, also present a challenge, but one which is much more fulfilling and susceptible of resolution.

Some technical issues
Technical issues involve not only the choice of recording equipment, computer hardware and software, etc., but also purely linguistic, even theoretical, choices, e.g., the treatment of c1itics, morphophonemic variants, etc. I give brief details of equipment choice and of linguistic decisions in this section, expatiating somewhat on the challenge ideo phones pose to the conscientious lexicographer.

3.1
Hardware and software I omit discussion of some of the more obvious and common concerns and tum first to several resources that proved valuable to determining the sort of software and hardware needed for lexicographic work. In the way of preamble, it should be noted that I started with the collection of data in a distant" and somewhat inaccessible locale. This fact determined many of my choices.
Works I found particularly helpful, aside from the standard lexicographic works, e.g., Zgusta 1971, Hartmann 1983, came from such missionary organizations as the Summer Institute of Linguistics (SIL) and the Lutheran Bible Translators (LBT), e.g., Hughes 1987, Bartholomew ~d Schoenhals 1983. Members of the latter organization had helped me even before I made my first trip back to West Africa (and have provided consultation since)8. The JAARS Center, a support organization for both SIL and LBT, also made available relevant reference materials and provided advice on the technical side of computer work in the field 9 • In so far as the actual equipment one should use, a computer (or two) is absolutely indispensable. Portability is often crucial; fortunately, there are many exce:lent laptops on the market at prices that continue to fall and with power that continues to increaselO• With regard to software, at the time I was preparing for fieldwork the most widely used software designed for linguists was the lexical database system LEXW ARE (Hsu 1990). Its appeal was its linguistic orientation and the fact that it had already been used on ShiNzwani, a Bantu language closely related to Swahili and typologically similar to Kisi. LEXW ARE is described as, a package of programs designed to help linguists compile and manage files of lexical data. It is oriented toward compilation and exploitation of a lexical file as a RESOURCE for further investigation into aspects of a language and culture, e.g., phonology, lexicon, grammar, semantics, ethnog-I raphy, etc. (Hsu 1990: 1).
LEXW ARE is both useful and powerful. It performs all the tasks expected of such" software, chief of these being sorting and organizing the data in a perspicuous fashion. It furthermore generates a reverse dictionary, albeit more of an index than a real dictionary. The relative simplicity of its programs allows for a great deal of customization.
One great virtue of LEXWARE, because of its wide use by linguists and extensive documentation, is that it forces the compiler into analysing the language in detail and provides ample guidelines for doing so. It also forces one into making early decisions as to grammatical categories and structures. A virtue of its versatile (and simple) architecture allows early (wrong) decisions to be changed with relative ease as one's analysis and knowledge of the language proceeds.
A weakness, however, that all lexicographers must consider, is that LEXW ARE is not designed for publishing dictionaries, although the programs have been used for such purposes in many cases. It does not work on data with embedded word-processing codes but rather requires ASCII-formatted text. As can be seen in Appendix A (and (4) below), for example, Kisi tones have been represented as post-vocalic slashes and mid vowels as upper case "0" and "e". One can certainly enter the data using a word-processor but must make sure to feed it to the programs (performed in batch mode) only in DOS.
Additional details of the programs can be found in Kari 198911, which describes their application to Ahtna, an Athabaskan language of Alaska. SIL software such as IT, an interlinear text-processing program (Simons and Versaw 1992), and SHOEBOX, a data management program (Wimbish 1990), are other alternatives that should be considered. I now give a few details to illustrate the use of LEXW ARE.
The lexical data base on which the LEXW ARE programs operate is organized into what are called "bands" of information that are tied to a "headword" or sub-entry within that headword's entry. I give an (abbreviated) example of the sorts of information that can be encoded in (3)12. The band denoted by "der", for example, contains derivationally related words that would not be listed as sub-entries; "Ix" states whether the entry is a compound, a word, an affix, etc. (3) Band name abbreviations and brief characterizations ps df headword part of speech definition written sources phonetic and morphophonemic variants dialectal variants derivationally related words lexical information, size of the unit morphological information syntactic information non-definitional semantic information background information, encyclopedic illustrations collocational, idioms discourse properties usage sociolinguistic pragmatic information existential status etymological source notes In (4) I give a much-abbreviated sample entry to show how Kisi noun classes are represented. The stem here is sfa-'orange'; and it can take three different noun class markers (suffixes) and mean 'orange tree', 'orange (the fruit); orange juice', and 'drop of orange juice'. The first line contains the headword, the second indicates its part of speech, here a noun belonging to the 0 class, the third line gives a definition, etc. The asterisks are markers for the reverse sort program "INVERT", and the non-alphabetic symbols illustrate the sorts of conventions that have to be adopted because the programs work on ASCII text only. The symbols "&", "%", and "I" indicate typographic directions. For example, "%" indicates that the following string must be italicized. Note also how sub-entries are preceded by two periods; sub-listings may be extended down to any number of levels. Alternatives for an entry are denoted by numbers, as in "1df' and "2df' below.
(4) Sample entry .hw &si/a/u\wo/ ps n&o df "orange tree (%"xylopia aethiopica I) wr TSL: siawo Ix word sr 1lA .. pl &si/a/u\wa/ng ps n&ma 1df "fruit of the orange tree; "orange; "orange juice 2df "orange "tree .. sg &si/a/u\le/ng df "fruit of the orange tree; "orange; drop of "orange juice ps n &le Appendix A contains a sample page from a formatted version of the data base; Appendix B has a page from the English index. The LEXW ARE programs can also sort bands and produce statistics on the data base.

linguistic issues
In this section I adumbrate the linguistic issues that I found most challenging in compiling a Kisi dictionary.
Lexicography and grammar. Certainly one has to begin lexicographic work with a thorough knowledge of the target language's gramnlar, yet this is hardly an attainable object in any reasonable period of time. This knowledge must reach a fairly intimate level. At the same time one has to be familiar with the analytic tools, the formal representations, and the typological possibilities for

Reproduced by Sabinet Gateway under licence granted by the Publisher (dated 2011.)
language in general. A trade-off exists between the two, embodied in what could be called, "The lexicographer's paradox", given in (5).
The more sophisticated one becomes in one's approach to language, the less likely it is that one will be intimately familiar with a language other than one's own.
b. The more intimately one knows a language, the less likely one is to be familiar with alternative approaches, theories of language and linguistic analysis.
Dictionary and encyclopedic knowledge. One also has to ponder the issue of what sort of information should be included and what excluded. One criterion for that selection is the dichotomy between what has been called denotation and connotation, sense and reference, and so on. Should one exclude encyclopedic knowledge in a dictionary entry? Where in fact is the boundary between dictionary and encyclopedic knowledge? It is generally agreed upon that the knowledge of the semantics of a language -properly codified in something like a dictionary --:-is distinct from that knowledge of the real world, which belongs in an encyclopedia alone (Haiman 1980: 331). For example, Bloomfield (1933, as discussed in Haiman 1980) sees no precise way of defIning words such as 'love' and 'hate'. In other words, at least in this case, there is an equation between semantics and knowledge of the world. The easy answer to the lexicographer's dilemma is to exclude real-world knowledge, but often that knowledge is necessary, particularly when dealing with an "exotic" language and culture (d. Busane 1990: 33-34). In the Kisi dictionary, I have,chosen to include all cultural information that would not be of general knowledge to an "outsider". For example, details of upland rice cultivation procedures are included as are procedures of the Poro and Sande (secret initiation societies for men and women respectively).
Lumping and splitting. One antinomy basic to linguistic analysis is that between those who seek to maximally differentiate and those who look for similarities and seek to group exhaustively. This tension has playe<;l itself out dramatically, for example, in the controversy surrounding work on the classification of Amerindian languages, e.g., between, Campbell (a splitter) and Greenberg (a lumper) (see Matisoff 1990 for discussion). This issue manifests itself in decisions when (grammatically) subclassifYP'g lexical (as opposed to functional) items. For example, in most NigerCongo languages the noun-class or gender systems allow for an easy categorization of nouns on morphological grounds (see (4) above), but verbs present problems. One has to consider argument structure, aHowab.le complem~ts, etc., and identify ~hen collocational restrictions are determmed semanttcally rather than syntactically.
Ideophones13• Ideophones pose enormous problems to the lexicographer because of their monumental variation and semantic indeterminacy. The first problem, however, is their identity and coherence as a category. All words vary and likely form only prototype (as opposed to discrete and invariant) categories (e.g., Labov 1973), and ideo phones form decidedly diffuse ones. Crosslinguistically ideophones may constitute one syntactic category, may be subcategorized in another, or may be found in multiple word categories. Yet in all of these cases, within a particular language, there is shading off into other word categories, or a "squishiness" (Ross, e.g., 1972) to the ideophone category identified.
Formally ideophones vary a great deal, which variation can be interpreted as neutralizing phonolOgical contrasts operant elsewhere in the language. I give an example in (6) of a set of ideophones all of which have the same meaning.

(6)
Lumping "word clusters" in Gbeya / Gbaya (Samarin 1991) ham hem hal hEI pal pEl 'light (in weight)' This is much like, from a diachronic perspective, what Bolinger (e.g., 1940) calls "accretion", the gradual snowballing or clustering of meaning about a particular sound meaning correspondence, forming a phonaesthetic partial often known as sound symbolism, a normal process in language change. What's worse with regard to the example in (6), some speakers will regard the minimally different forms as different words; others will see them as the same word.
The solution for a splitter is to regard each slight variation as a different word. This is the approach adopted in Doke et al.'s (1990) Zulu dictionary, e.g., gqamfu, gqashu, gqimfu, and gqunsu all have a gloss, 'of snapping'. The lexicographers proliferate entries even further by regarding doubling or reduplication as representing a separate form, e.g., both cwayi and cwayicwayi, separate entries, have glosses 'of blinking'. Reduplication is commonly associated with ideo phones and with expressive language in general. That this approach is characteristic of the dictionary can also be seen in the decision to list, both a derivational affix and at least some of the forms both with and without that affix.
Lumping, on the other hand, which approach the Kist dictionary follows, seeks to recognize such clusters explicitly and provide them with a single (full) entry. The alternates cross-reference the head entry (there is a band "cr"not shown in (3» and are listed as either sub-entries or phonological variants.
What is additionally problematic to the lexicographer from a phonological perspective is ideophones' use of phonemes not constituting part of the phonological system of the matrix language. For example, some Zulu speakers reject ideophones with [r] as "not Zulu words". In addition, ideophones exploit prosodic resources, e.g., an expanded pitch register, breathy or creaky voice, not easily represented in familiar orthographies. Is it" the lexicographer's task to faithfully record these features? If so, how?
Semantically, ideophones can do as little as simply underscore the meaning of the verb with which it has a close collocational association. Just 'as often they go to the other extreme and carry the entire semantic import of the predication. This occurs most transparently when they are introduced by a semantically bleached "dummy" verb such as thi 'say' in Zulu (e.g., Von Staden 1977: 214), the" cognate ri in Venda (Poulos 1990: 422), or go in English, 'Tucker ran lickety-split down the road.' The latter case is not so problematic, but what does one do when the ideophone seems to have no independent meaning?
Another problem is semantic variation. Sometimes the form will hold constant and the meanings will change slightly from speaker to speaker and from dialect to dialect (see "enchainment" or "abduction" in Haiman 1985). Admittedly these same processes are at work in other parts of the language, but within the ideophonic subsystem they are more extensive and more frequent.
Another issue ariSing in the inclusion of ideophones in a dictionary is their troublesome pragmatics. In the case of ideophones, it is often the case that meaning is determined situationally by speaker and hearer. This becomes most apparent in performance situations when a narrator publicly presents a story (Noss, e.g., 1988, Poulos 1990. In much the same way as gestures (see below), ideo phones are embedded in social interaction. Determining the meaning of ideophones can prove incredibly frustrating to the lexicographer since the meaning of an ideo phone requires a context for interpretation much more than other words. In addition, ideophones require for their understanding an intensive knowledge of the language, a knowledge often inaccessible to an outsider (Samarin 1967).
The close functional relationship of ideo phones to gestures demonstrates some further problems. Gestures are poorly understood at best and have not been considered part of language (as have not ideophones for some), yet often they form a necessary concomitant to an ideophone. Gestures are decidedly a part of language, as McNeill has convincingly demonstrated: "gestures are an integral part of language as much as are words, phrases, and sentences -gesture and language are one system" [author'S italics] (McNeill 1992: 2). That gestures and ideophones are closely linked has been commented on by many, e.g., Alexandre 1966. In a study of Japanese "mime tic s", roughly comparable to ideophones, Kita finds that mimetics exhibit close synchrony with the stroke portion (the essential part) of gestures 98% of the time (Kita 1992); Zulu ideophones exhibit the same sort of synchrony in a pilot study I have performed. How then must gestures be represented?
. Because of these many problems, ideophones have often been ignored or omitted. The approach often taken is to ignore them, e.g., Munro and Caye 1991, or consider them only peripherally (Institute for Swahili Research 1981). Such an approach (and it has also been followed in grammatical descriptions) is irresponsible, especially when ideophone!; constitute a significant portion of a language's lexicon (5,000 in Cbeya (Samarin 1978)14}, or constitute an open and productive class, e.g., Igbo (Maduka 1983-84).

Conclusion
What I have tried to do in this paper is give a personal and partial account of work on the lexicon (and grammar) of Kisi. I have exemplified several of the issues that arise in the process of compiling such a work, hinting that the theo.-retica1 or purely lexicographical issues admit to easier resolution than the purely pragmatic ones. This emphasis should be interpreted as a warning, but these caveats should not dampen anyone's enthusiasm. Numerous languages, particularly in Africa, have need for the attention of a dedicated lexicographer. Admittedly there is a high learning curve to the work, but the knowledge and output are both very satisfying. Another consolation is that once one ~as .completed a dictionary, the next one is much, much easier.

Endnotes 1
Harbnann 1990 contains a sketchy and perhaps uneven assessment of lexicography in Africa. See Prinsloo 1991 for a review. 2 Note that the question marks represent uncertainty on Williamson's part. 3 Childs 1993b contains a brief survey of Atlantic. 4 See Samuel Johnson's definition of a lexicographer as "a harmless drudge". These are not the only rewards; there is often a small but appreciative audience. There may be expressions of gratitude from the speakers of the target language and from expatriates, for example, Peace Corps volunteers and missionaries. In addition to the expected benefit to linguists of such work, one is able to contribute to such projects as the Useful Plants Project of the Royal Botanic Gardens at Kew (United Kingdom). 5 Although Sierra Leone was colonized by the English and Guinea by the French, one caIIDot really say that Uberia was colonized by the Americans. Nonetheless, American influence has been pervasive and certainly differs in its effect on the Kisi orthography used in Liberia. There are actually two different systems in Liberia, one developed by the Anglo-American Church of England missionaries in Bolahun, the other by the American Lutherans in the Foya area.

6
As recent exchanges in linguistics journals indicate, e.g., Hale et aI. 1992, and e)(changes on the "Unguist" electronic bulletin board document, there is concern for documenting the less widely spoken languages (d. ladefoged 1992). 7 The reader might want to compare this account with Kari's (1989) project history of his work on Ahtna, an Athabaskan language of Alaska, a decidedly more stable area than Uberia. 8 I should also mention the not inconsiderable assistance provided me by Stanley Cushingham of the Center for Applied' Research in African languages (New Haven, CT (USA». 9 Their particulars are, International Computer Services, JAARS Center, Box 248, Waxhaw, NC 28173 (USA); (704) 843-6000.

10
One wonders what Murray would have been able to accomplish had he had access to a computer in his compilation of the OED! 11 Kari's dictionary (the Ahtna-English portion) contains some 6 000 lexical entries and over 9 300 example derivations and sentences. The English-Ahtna side has 10 500 entries. 12 Bands can be proliferated at will and not all entries will have all bands filled. One lexical data base using LEXWARE has over 300 bands (Hsu 1990: 23); IGsi now'has forty-eight 13 I assume here some familiarity with ideo phones. For an early survey of Bantu ideophones, see Samarin 1971. Childs 1993a contains a more recent and e)(tensive discussion. 14 In a Gbeya (= Gbaya) French-English dictionary, ideophones constituted 24.6% of the lexicon (8544 entries) (Noss 1985: 242).