The User Perspective in Lexicography : The Lemmatisation of Fixed Expressions in Duramazwi Guru reChiShona

The article discusses the user perspective and information retrieval in relation to the lemmatisation of specific multi-word lexical units, namely fixed expressions, in the Shona monolingual dictionary, Duramazwi Guru reChiShona. It shows that the decisions arrived at in lemmatising fixed expressions were influenced by a user-driven approach. The article gives a comparative analysis of how fixed expressions were treated in previous Shona dictionaries and how they were subsequently dealt with in Duramazwi Guru reChiShona. Previous dictionaries have grappled with the problem of giving fixed expressions as run-on entries. Against the background of the user perspective, it will be argued that the lemmatisation of fixed expressions in monolingual dictionaries has certain advantages over previously used strategies.


Introduction
The article discusses aspects of information retrieval in a monolingual dictionary against the background of the user perspective.In order to show that lemmatising specific multi-word lexical units (MWLUs), namely fixed expressions has the advantage of creating a more user-friendly dictionary as opposed to entering them as run-on entries, it analyses the macrostructural treatment of fixed expressions in Duramazwi Guru reChiShona (2001), the advanced Shona monolingual dictionary.It is easier for a user to access information from the macrostructure than to search for it within another entry in the microstructure.
The macrostructure of a dictionary consists of the lemmata arranged according to a specific structure.Burkhanov (1998: 146-147) distinguishes three types of macrostructure in dictionaries.The first type is the ideographic macrostructure where lemmata are organised by semantic affinity, while the second type is the alphabetical macrostructure where lemmata are organised according to the alphabetical ordering of the graphical word.The analogical macrostructure constitutes the third type, involving a combination of both the ideographic and alphabetical ordering.The macrostructure that this article refers to is the alphabetical type.The microstructure of a dictionary, on the other hand, is the internal structure of the dictionary entry itself.For instance, pronunciation, the lexical category of a word, the ordering of senses, examples and synonyms are all part of the microstructure.
By fixed expressions in this article are meant proverbs, idioms and pithy sayings.A proverb is a statement full of wisdom handed down from generation to generation.Didactic in content, it is usually used to advise, admonish or instruct.Collins COBUILD English Language Dictionary (1991) defines a proverb as a short sentence that is often quoted, giving advice or telling something about human life and problems in general.Shona proverbs are complete statements having a bipartite structure consisting of two propositions: the first proposition foregrounds the proverb, while the second proposition complements the first.The overall meaning of the proverb is derived from the balance and cross-correspondence between the two propositions.Detailed descriptions of the form of Shona proverbs are given by Fortune (1975Fortune ( -1976) ) and Chimhundu (1980).
An idiom is a phrase whose meaning cannot be predicted or deduced from the individual meanings of its constituent parts.Its meaning is figurative (Kipfer 1984: 103).As its meaning is greater than the meaning of its constituent parts, it must be considered a unity.Structurally, Shona idioms are verbal in form, always beginning with the infinitive verb ku-'to'.
Pithy sayings in Shona are neither proverbial nor idiomatic in meaning, but are statements used in everyday language to comment on events, or in some cases, to offer advice.
Duramazwi Guru reChiShona (hereafter referred to as DGS) is divided into two sections.Section 1 contains the main A-Z section of the dictionary includ-ing idioms, while section 2 contains proverbs and pithy sayings.The article will look specifically at the way proverbs, idioms and pithy sayings were lemmatised in DGS with particular focus on fixed expressions as lemmata vs. fixed expressions as run-on entries.The article will show that a user-friendly dictionary takes the needs of the user regarding access to information and retrieval of information in a dictionary into cognisance.It will also indicate that the decision to lemmatise fixed expressions in the DGS was influenced by the userdriven approach.

The user perspective in lexicography
The user perspective or the user-driven approach in lexicography refers to the dictionary compiler's recognition of the importance of the needs of the user of the dictionary with reference to the type and function of the dictionary, and the structure of its entries in the macrostructure as well as in the microstructure.
The macrostructure is most important as it forms the access point to the information contained in the dictionary.Before a dictionary is compiled, the lexicographer has to identify the potential user of the dictionary and plan the nature and structure of the dictionary according to this target user.It is therefore the target user who determines the type of dictionary to be compiled.The target user also determines the overall macrostructural and microstructural characteristics of the dictionary.The other important aspect of any dictionary is whether the user is able to find the looked-for information.The macrostructure of the dictionary has to be designed in such a way that information retrieval is effected easily.In other words, the dictionary has to be user-friendly, which means that it has to satisfy the following parameters: -Macrostructural accessibility: Is the user able to find the looked-for information?
-Information retrieval: Is the information found with the least cognitive effort?
-User-friendliness: Is the information given where the user expects to find it?
The guiding principle of the user-driven approach is that lexicographers should take the needs of the users into consideration during the planning and compilation stages of the dictionary project.A dictionary therefore has to be organised in such a way that it caters for the users' needs.Lexicographers should also not take for granted that all users have the necessary dictionary searching skills.Hence, a dictionary should be compiled in a way that meets the needs of both the skilled and unskilled users, thereby simplifying the information retrieval, which is of paramount importance in any dictionary.Dic-http://lexikos.journals.ac.za tionary users have to be able to find what they are looking for in as little time as possible.Gouws and Prinsloo (1997: 46) state: User-friendliness in dictionaries implies that the contents of the dictionary are made as accessible to the user as possible.… The macrostructure remains the main access structure of any dictionary with a strict alphabetical ordering system.Lexicographers too often neglect the importance of a well-designed macrostructure as a functional component of the total linguistic contents of a dictionary … Proverbs, idioms and pithy sayings are not only in daily use in speech and conversation but also form part of the Zimbabwean school curriculum.Students are examined not only on grammar, composition writing and comprehension skills, but are also tested for their ability to give the full forms of these fixed expressions as well as their meanings.Students may also be required to write an essay based on a proverb, an idiom or a pithy saying.In order to do this they have to know what a particular proverb, idiom or pithy saying means and how it is used in context.The justifiable assumption is that it is these students who consult a dictionary like the DGS.Meaning and context are the domain of dictionaries, hence the inclusion of fixed expressions in the DGS.By having fixed expressions lemmatised, the dictionary becomes a complete reference manual, where the user can find all the looked-for information in one place, in this way also being time-saving and cost-effective.

The treatment of fixed expressions in previous Shona dictionaries
There is no set method for the presentation of MWLUs like fixed expressions in dictionaries.Usually lexicographers use the method that best suits the type of dictionary to be compiled, and the target users in question.The problem faced by lexicographers past and present has been whether MWLUs should be headwords, therefore being part of the macrostructure, or whether they should be treated under one of the single-word lemmata, thus forming part of the microstructure (Svensén 1993: 207).Svensén says that it has been difficult for lexicographers to break away from the principle that a headword is the same as a graphical word.Hence dictionary macrostructures have taken the graphical word as their basis.Gouws (2003: 39) points out that dictionaries have been characterized and dominated by what he refers to as a word-bias.This bias has been so entrenched in lexicography that the norm in dictionaries has always been to enter words and to nest any units larger than words into an entry as run-on entries.
There have, however, been inconsistencies in the way MWLUs have been treated in dictionaries.Čermák (2000: 490) identifies two methods in lexicographers' treatment of MWLUs.One approach is to list them under a specific sense of the single-word lemma.With this method, a proverb like Chara chimwe hachi-tswanyi inda (One finger cannot crush a louse) would be placed under either of the nouns chara (finger) or inda (louse), or under the verb -tswanya (crush).The other approach is to list them separately at the end of the lemma.With this method, one word in the MWLU would also have to be chosen as lemma under which to include it.
This method is also problematic because, in Shona for instance, some phrasal entries can begin with the same word, such as -bata nekuseri kweruoko (lit.touch with the back of the hand, i.e. treat badly), and -bata denga (lit.touch the sky, i.e. be overcome with joy), -bata jongwe muromo (lit.touch the cock on the mouth, i.e. be an early riser).This would mean that an entry such as -bata (touch, hold), which in itself has 18 senses in the DGS, would be very congested.In terms of information retrieval, such an entry would not be considered user-friendly as the more congested an entry, the more difficult to retrieve required information in a short time and with the least effort.Possibly such an entry would also be clumsily and awkwardly constructed.
In both methods cited by Čermák, MWLUs are treated as run-on entries under a single-word lemma.Čermák (2000), Moon (1992) and Lorentzen (1996) are critical of both methods as they result in many inconsistencies, leaving the user guessing under which sense an MWLU might have been listed.Consequently, it is difficult to find MWLUs in any reliable way, because, as Čermák (2000: 490) says, this guessing game is rather difficult to follow.
With reference to stem lemmatisation vs. word lemmatisation, Gouws and Prinsloo (2005) point out that in Bantu language dictionaries lemmata are becoming heavier to the left.While this is true with regard to morphological considerations, lemmata are also becoming morphosyntactic.As a result of the changing trends in lexicography, dictionaries are now lemmatising units larger than the word in a bid to meet the needs of different users.

Proverbs
In previous Shona dictionaries, namely Hannan (1959Hannan ( , 1974) ) and Dale (1981), proverbs were not given lemma status but were entered as run-on entries under what was considered to be one of the semantically heavy words in the phrase or under the word where the user is most likely to look for it.The general definitions of the word would be given first and then, following that, the other forms or combinations in which the word is found.In Hannan, proverbs were presented as follows: (1)

Idioms
Idioms were also treated as run-on entries under what was perceived to be the most semantically heavy noun or verb.When they were treated as run-on entries, their placement was problematic, that is, the choice of a suitable noun or verb under which to include them.For instance, the idiom (ku)tamba nemadhaka pasina mvura (lit., to play with mud where there is no water, i.e. to do things that can get you into trouble) can be given as a run-on entry under the verb (ku)tamba (to play) or under the nouns madhaka (mud) or mvura (water).
Similarly, (ku)bata chibharo (to rape) can either be entered under the verb -bata (touch, hold) or under the noun chibharo (hard labour).Such an either-or situation can result in a confused user.Zgusta (1971: 276) says: Apart from the problem of their selection, … there is the problem of where to list these expressions.Smaller dictionaries list them at the end of the entry … Bigger dictionaries list idiomatic expressions at the end of the examples of free combinations of words.Hannan (1959) gave idioms as run-on entries under what he must have considered the most common noun or verb.The idioms are placed last within the entry, after the definitions and usage examples.In some cases, he marked them 'Note idiomatic usages'.For example: ( With Hannan's method, it would not be easy for the user to find the idioms.Since the dictionary was targeted at users who wanted to learn Shona as a foreign language, the failure to label the fixed expressions accurately gives the impression that they do not exist or that they are not important. In Dale (1981), idioms were also given as run-on entries.For example, the idioms (ku)bata maoko (to offer sympathy) and (ku)bata kumeso (to deceive) are both found under the verb (ku)bata, though Dale did not indicate that these are idiomatic usages but labels them as transitive verbs.The following entry shows how idioms are presented in Dale: There are no significant differences between Hannan's and Dale's treatment of proverbs and idioms.They both interchange the two methods identified by Čermák (2000: 490).With these methods, the user's search for the proverbs would be more laborious and cumbersome.Although seemingly straightforward, these methods are in fact circular and inconsistent because there is no rule for determining which is the so-called semantically heavy word but it is left to the individual lexicographer's decision.For example, in the case of -chenga 'keep close' in ( 1) and ( 3), the definition of the proverb Chenga ose manhanga, hapana risina mhodzi could be found either under the verb -chenga 'safeguard' or under the nouns manhanga 'pumpkins' or mhodzi 'seed'.This method is sufficient for the user who is acquainted with the full form of the proverb, but in cases where only a part of it is known, it will be very difficult if not impossible for the user to find it when it is treated as a run-on entry.The other weakness of this method is that information is more difficult to access because of the denseness of the entries into which different kinds of information are concentrated.
In Duramazwi reChiShona (1996), the predecessor to the DGS, some idioms were entered but not marked.They were given as run-on entries of the respective nouns or verbs from which they are derived.Thus, an idiom like (ku)dya marasha (lit.to eat fire, i.e. have a harsh temperament) (sense 5 in example (8)) is found under (ku)dya (eat).Entering idioms in this way would not have been easy in the DGS, because of the large number of idioms included.This method, as mentioned earlier, is also not user-friendly.That is why it was decided to give idioms lemma status in the DGS.

Proverbs
Shona proverbs (tsumo, abbreviated ts) are of two types: short sayings and complete statements having a bipartite structure.Examples of short sayings are: (9) Rudo ibofu.ts.Iyi itsumo inoreva kuti kana munhu adzamirwa norudo haazombooni kuti pane chimwe chinhu chakaipa pamudikani wake.(lit.Love is blind.)(10) Atsunya arwa.ts.Iyi itsumo inoreva kuti munhu anoita kanhu kadikidiki pane chimwe chiitiko chihombe anenge aitawo pake ipapo zvekuti vanhu havafaniri kumushora.(lit.He who has pinched has fought, i.e. he who has done little compared to others should not be despised.) (11) Kunonoka huvhizhura.ts.Iyi itsumo inoreva kuti dzimwe nguva kunonoka ndiko kunowanisa munhu zvinhu zvakanwanda pane kukasira.(lit.To be late is a form of speed, i.e. at times being late in doing something can be rewarding.) Such short sayings as given in examples ( 9)-( 11) do not pose serious challenges in lemmatisation, because they can be compared to compounds or collocations in terms of the space the lemma takes up.Fortune (1975Fortune ( -1976: 26) : 26) refers to this type of proverbs as "verbal sentences".The larger number of proverbs which have a bipartite structure were entered in shortened form.The shortening of the proverbs was not done haphazardly but the first proposition of the proverb was entered as lemma.The full form of the proverb was then given after the first proposition, indicated with a bullet (•) as marker, followed by the explanation in brackets, as shown in examples ( 12) and ( 13 Example ( 13) shows that all proverbs having the same beginning and the same meaning were included under one entry.

Idioms
Idioms in Shona (madimikiva, abbreviated dimik) take the form of verbal phrases, that is, the initial word in the idiom is often an inflected verb.The trend in previous dictionaries was therefore to enter the idiom under the verb controlling its meaning.In the DGS, however, idioms were given lemma status.
Idioms in Shona all begin with the infinitive verb ku-'to'.The verb stem was therefore hyphenated and entered in its alphabetical place as can be seen from the entries of the verbs -bata 'catch, hold' and -dya 'eat' and the various idioms derived from these verbs.Because of space constraints only a few examples of the general definitions of the verbs are given (see examples (14)(a)-(d) and (15)(a)-d)).In the DGS, the verb -bata has 18 senses and there are 46 idioms beginning with it.The same applies to -dya which has only 4 senses, but forms the beginning of 42 idioms.If these idioms had all been treated as run-on entries under the verbs -bata and -dya, these lemmata would have been very congested. (

Pithy sayings
Pithy sayings, categorised as kachirevo (abbreviated kr), were entered in the second section of the dictionary as part of the macrostructure.In form they resemble both proverbs and idioms.Some of them are two-word phrases such as (16), while others are complete statements such as ( 17) and ( 18).
(This is a pithy saying that says that what is done is done and people should not worry about what they can no longer change.) (17) musha mukadzi kr.Kuti musha mukadzi kureva kuti, kuti murume ave nechiremerera, anofanira kuva nemudzimai anenge ari iye muchengeti wake anoona kuti basa repamba rafamba here.(To say that a woman is what makes a home means that for a man to be respected socially, he has to have a wife to look after him and to take care of the home.) (18) imbwa hairangwe nekudimurwa muswe kr.Ichi chirevo chinoreva kuti chero munhu akakanganisa zvakadii asi haapiwi chirango chinozomurwadza kweupenyu hwake hwose.(This is a saying which means that if someone wrongs you do not give them a harsh penalty that will affect them for the rest of their life.) http://lexikos.journals.ac.za

Conclusion
A dictionary should be compiled in such a way that it is user-friendly.The best way to achieve this is to ensure that the user finds what he/she is looking for.The most effective way to achieve this is to ensure that the information in the dictionary is easily accessible to the user.This article has shown that in lemmatising fixed expressions such as proverbs, idioms and pithy sayings, Duramazwi Guru reChiShona attempted to avoid circularity, making it easier for the user to retrieve the information.The article has furthermore indicated that taking the target users of the dictionary, who are mainly students, into consideration also influenced the inclusion of fixed expressions as lemmata.