The Structure of an Afrikaans Collocation and Phrase Dictionary

In this article an Afrikaans collocation and phrase dictionary for mother-tongue speakers (primary target group) as well as advanced learners (secondary target group) is discussed. The position which such a dictionary occupies among other dictionary types is pointed out. A motivation is also given for the inclusion of idioms and other fixed phrases in the proposed dictionary. The three key approaches with regard to the interpretation of the term collocation are examined, Le. the text-oriented approach of Halliday and Hasan (1976), the statistically-oriented approach of Sinclair (Collins Cobuild) and the signiIicance-oriented approach of Hausmann (1984). The arguments in this article favour Benson et al.'s (1986) implementation of the significance-oriented approach. Statistical evidence could be used to examine the usage frequency of collocations and phrases. The advantages and/or disadvantages of these approaches are considered. Three types of words and their treatment in the dictionary are discussed: those which have a very wide range of combination, those which have selectional restrictions imposed by general semantic features, and those of which the range of combination is restricted by certain other words. It is argued that only the last two types should be included in this dictionary. As one of the target groups is unsophisticated learners with a limited grammatical background, the ideal would be to enter lexical collocations both at their bases and at the collocators. To save space however, more information such as examples could then be provided at the bases only. Grammatical collocations should be entered at the bases, Le. nouns, verbs and adjectives. The division of the dictionary articles into two components to meet the needs of both intended target groups, is discussed.

dering van Hausmann (1984).Die voor-en nadele van hierdie benaderings word oorweeg.Die argumente in di~ artikel gee voorkeur aan Benson et al. (1986) se toepassing van die betekenis_ georienteerde benadering.Statistiese gegewens sou gebruik kon word om die gebruiksfrekwensies van kollokasies en frases te ondersoek.Drie tipes woorde en hul hantering in die Woordeboek word bespreek: di~ wat met 'n baie wye reeks woorde kan verbind, di~ waarvan die seleksie beperk word deur algemene semantiese kenmerke en di~ wat slegs met sekere ander woorde kan verbind.Daar word geargumenteer dat net laasgenoemde twee tipes in di~ woordeboek opgeneem word.Aangesien een van die teikengroepe ongesofistikeerde aanleerders met 'n beperkte grammatiese kennis is, is die ideaal dat leksikale kollokasies sowel by hul basisse as by die kollokators opgeneem word.Om ruinde te bespaar, kan meer inligting soos voorbeelde dan slegs by die basisse verskaf word.Grammatiese k~lIokasies behoort by die basisse, d.i.selfstandige naamwoorde, werkwoorde en adjektiewe, opgeneem te word.Die verdeling van die woordeboekartikels in twee komponente om in die ~ehoeftes van al twee die bedoelde teikengroepe te voorsien, word bespreek.Knowles (1997: 72) says: "It is a well-known but regrettable fact that very, very few language communities possess satisfactory collocations dictionaries ... The nonnal unavailability of collocations dictionaries is a great pity because that is exactly what advanced learners need, and indeed, what many native speakers hanker after too.In fact, it is not stretching things too far to say that first-class collocational control is the hallmark of the true L2 expert; collocational control is, of course, nonnally the last linguistic subsystem to be mastered by L2 learners who proceed to an advanced level." . .Where some languages have several collocation dictionaries, Afrikaans has none.Several Afrikaans phrase dictionaries do however exist.In this article an Afrikaans collocation and phrase dictionary, which is presently being compiled, is discussed.

The place of the collocation dictionary among other dictionary types
According to Hausmann et al. (1989: XLII, XLID) one can, in theory, differentiate between the following major syntagmatic dictionaries: the dictionary of syntactic patterns, the dictionary of collocations, the dictionary of set expressions and idioms, the dictionary of proverbs, the dictionary of quotations and the sentential dictionary.In practice, however, it is often difficult for the lexicographer to decide whether a certain word combination is a collocation or an idiom since certain collocations contain semantic specialized constituents.Cowie (1981: 230) comments in this regard: "Restricted collocations and idioms are sufficiently related in tenns of specialization of sense (of the part in the one case, of the whole in the other)."As the difference between collocations and idioms in this particular case is merely one of degree, this type of collocation can, within the cognitive approach, be regarded as nonprototypical idioms.Compare the following examples from Carstens (1992: 4): £lou verskonings, vuil grappe, 'n koue blik, onverteerde feite.Benson (1989) uses the tenn "transitional collocation" for this category.If the lexicographer experiences problems with these distinctions, how can he/she expect the user to know whether one should look up a certain word combination in a collocation dictionary or in an idiom dictionary?This does not suggest that there is no need for separate dictionaries with regard to certain target groups -compare Benson (1989: 5) who believes that idioms should be entered in idiom dictionaries and important idioms in general-purpose di<;tionaries.According to him transitional collocations and technical collocations should be entered in collocation dictionaries.He (1990: 25-31) also maintains that our existing monolingual dictionaries should change and suggests the development of two types of monolingual dictionaries.The first is a monolingual decoding dictionary (MDD) that would include the largest possible number of "difficult" words and that would devote minimum space to collocations and the core vocabulary of a language.The second is a monolingual general-purpose dictionary (MGPD), intended for native speakers and learners who seek help with decoding and encoding language."The learner who does not wish to use a learners' dictionary would find the MGPD ideal," Benson (1990: 27-28) argues."Its decoding capability would be considerable, but, of course, would be less than that of the MDD.The encoding capability of the MGPD would be very strong, but it still could not compete with a specialized combinatory dictionary as a handbook for the production of texts."Every dictionary is written within a specific time and social framework for a specific target group.One could specifically compile a practical collocation dictionary (primarily) for advanced learners of Afrikaans (d.Hausmann 1979, 1985), but which also contains the most frequently-used idioms and other fixed phrases or a theory-oriented collocation dictionary (containing only collocations) for linguists, language practitioners and lexicographers (d.Mel'cuk and Zolkovskij 1984: 43, 73).A third option was chosen ~or the dictionary which is presently being compiled.The dictionary will contain collocations and phrases and will be directed not only at mother~tongue speakers as primary target group but also at advanced learners as secondary target group.

The meaning of the term collocation
The tenn collocation should however first be defined, since it gives rise to different interpretations.
For Firth (1957) collocation refers to a co-occurrence relation between individual lexical items, such as for example dark night and you silly ass.A certain vagueness in the use of the term by Firth has given rise to a number of different interpretations, which can prototypically be identified as three key standpoints (d.Herbst 1996: 380),namely (a) a text-oriented approach (d.Halliday and Hasan 1976), (b) a statistically-oriented approach (d. the Cobuild Project of Sinclair) and (c) a significance-oriented approach (d.Hausmann 1984: 398).Herbst (1996: 380) evaluates the different approaches to collocation and comes to the following conclusions.
The text-oriented approach to collocation amounts to not much more than saying that in a text about coastal walking there is a certain likelihood for words such as coast, sea, path, climb or steep to occur as well.This kind of likelihood of co-occurrence of lexical items, however, seems to be determined to a greater degree by extralinguistic than by linguistic factors.The interpretation of collocation employed by Halliday and Hasan can probably be ignored.Hasan herself has shown that the usefulness of such an approach is limited (Herbst 1996: 383).It must also be doubted whether there is much point in using collocation for any kind of co-occurrence of two lexical items.
A purely statistical.view of collocation as advocated by Sinclair seems problematical for a number of reasons.Firstly, there are the general problems involved in any kind of corpus analysis, especially regarding the representative nature of the material analysed.However, computer-assisted analysis may help overcome this problem.In this' regard Smadja (1993) also suggests that a computer could be used to get a representative database.He points out that several approaches have been proposed to retrieve various types of collocations from the analysis of large' samples of textual data.These techniques automatically produce large numbers of collocations along with statistical figures that reflect the relevance of the associations.None of these techniques provides functional information along with the collocations.Also, the results produced often contain improper word associations, i.e. not true collocations.Smadja (1993: 143-177) describes a set of techniques based on statistical methods for retrieving and identifying collocations from large textual corpora.These techniques produce a wide range of collocations and are based on some original filtering methods that allow the production of richer and higher-precision output.These techniques resulted in a lexicographical tool, Xtract.A lexicographical evaluation of Xtract shows that 80% of the identified collocations are correct.Church and Hanks (1990) and Church et al. (1991) emphasize the importance of human judgement used in conjunction with these tools.For the proposed dictionary the compiler's own database containing data (mainly from Afrikaans magazines) as well as the database of the publishers will be used.
The second problem that Herbst (1996: 383) points out with regard to a purely statistical view of collocation is that positional statements such as those produced by Sinclair (in Cobuild) are of limited value if one disregards the con-text.Greenbaum (1974Greenbaum ( /1988: 115) : 115) illustrated, for example, that the occurrence of particular adverbs is determined by a number of factors.It must be doubted whether a purely statistical kind of analysis is able to accommodate the complexity of such factors.
Finally, there is the problem of the limited power of statistical statements.Is dark night for instance a significant collocation because nights tend to be dark and not bright?
The significance-Oriented approach makes provision for gradience.Any attempt to define collocation in this narrow sense can thus only be aiming at defining a kind of prototype of collocation, by recognizing the gradience character of the distinction between collocations and free combinations on the one hand and between collocations and idioms on the other hand.

The macrostructure of the proposed dictionary
For the proposed dictionary on collocations and phrases the compiler decided to focus not only on the lexical and grammatical collocations as used by Benson et al. (1986), but also on semantic collocations.
There are words which have a very wide range, others where the selectional restrictions can be described through general semantic features, and words of which the range is restricted to certain other words.Svensen (1993: 102) uses the term semantic collocations for the second type of words.
The following options could be considered: (1) one does not include collocations of this type, one includes the collocations just as one finds them in the data collection, or (3) one includes the collocations and uses a system where one indicates that certain words act as hyponyms and/or one uses selectional restrictions where possible.
As regards option (I), these collocations should be included for a number of reasons: (a) We live in a multilingual country where a large percentage of the people who speak Afrikaans, are not mother-tongue speakers of Afrikaans.With regard to the use of synonyms mention should be made of a small part of the research that was conducted with 20 first-year students.Sixteen students had Xhosa as their mother tongue and Afrikaans as their third language.Three students had English as mother tongue and only one had Afrikaans as mother tongue.
In sentence (1) 15 students chose the corre~t synonym, i.e. bereik, and 5 students chose the wrong one.In sentence (2) 14 students chose the correct synonym, i.e. behaal, and 6 students chose the wrong word.In sentence (3) 13 students chose the correct word, i.e. bereik, and 7 students chose the wrong word.
-Of course behaal In doel is possible in Afrikaans, but then it is used in the con- ~ext of sport.)In sentence (4) 14 students chose the correct word, i.e. behaal, while 6 chose the wrong one.In sentence (5) 10 students chose the correct word, .e bereik, and 10 students the wrong one.
1.•Although bereik can combine with a wide range of words, one can therefore argue that one should include both bereik and behaal in the dictionary and have cross-references between them to help the user to choose the right word. (d) Another motivation for including collocations is that one can present antonyms, again by using cross-referencing, e.g.aanbodluitnodiging aanvaar/van die hand wys vs. argument aanvaar/verwerp, etc.
With regard to option (2), one could not consider this option because it implies that no other collocates exist other than those listed in the dictionary, which is not true.Compare the following sentence with aanvaar: "Die vooruitsig op 'n bleskop aanvaar Cora Marie (a cancer patient) nou gelate."One of the meaning distinctions of aanvaar could be: "berus in" (come to terms with) with the selection restriction "iets MOEILIKS of NEGATIEFS", followed by the most frequently used collocations.Therefore, one should include collocations with a wide range, provided that one combines this option with option (3): to use selection restrictions and/or hyponyms.An example of a hyponym could be verantwoordelikheid aanvaar which could then be replaced by e.g.pligte, toesig, etc.One should, however, be very careful when deciding on the wording of selection restrictions.Carstens (1992: 4) states for instance that the verb pleeg (commit) is only selected in the presence of the meaning feature [+MIS-DAAD]/[+CRIM:E].One does not however use pleeg only in the presence of the meaning characteristic [+MISDAAD], d.Ek het ook 'n paar versies en skildery) gepleeg (HAT: 803).Furthermore, pleeg is not often used in combination with words indicating crime.Compare: •molestasie pleeg vs. mole steer •aanranding pleeg vs. aanrand •verkragting pleeg vs. verkrag •roof pleeg vs. beroof •inbraak pleeg vs. inbreek, inbraak vind plaas •smokkelary pleeg vs. smokkel, smokkelary vind plaas •'n verkeersoortreding pleeg vs. begaan A third category of collocations includes words of which the range is restricted by other words, e.g.dawerende applous, die onderspit delf, etc.
Where should collocations be entered in the dictionary?Hausmann's approach Hausmann breaks down lexical collocations into a base and a collocator (1985: 119-121).In verb + noun collocations such as brand stig the noun is the base, and the verb is the collocator.In adjective + noun collocations such as dawerende applous the noun is once again the base, and the adjective is the collocator.In adverb + verb collocations such as haarfyn beskryf the verb is the base, and the adverb is the collocator.In adverb + adjective collocations such as blakend gesond the adjective is the base, and the adverb is the collocator.
In theory this works well with sophisticated learners who know the difference between the different parts of speech.However, apart from the fact that most users do not read the front matter of dictionaries, many learners struggle with parts of speech and even if they know the difference between for example a noun and a verb in theory, they sometimes do not know whether an individual word is a noun or a verb because they do not know the meaning of the particular word.For unsophisticated learners with a limited grammatical background, the ideal would be to have these collocations entered both at the base and the collocator.To save space however, more information, such as examples could perhaps be provid~d only at the bases.
Hausmann does not refer to grammatical collocations.However, on the basis of his approach we can, following Benson (1986: 6), assume that: (a) if a grammatical collocation contains a noun, the noun is the base -vertroue in, neem 'n eed dat hy dit sal doen, plesier om te werk, op jou stukkel gemak; (b) if a grammatical collocation contains an adjective, the adjective is the base -oortuig dat, geheg aan; (c) if a grammatical collocation consists of a verb and a preposition, the verb is the base -dink aan, iemand herinner aan, jou vergryp aan; (d) if a grammatical collocation consists of a verb and a second verb in the infinitive, the first verb is the base -besluit om iets te doen, geniet om iets te doen.
The microstructure of the proposed dictionary The dictionary article will be divided into two interactive components.

The first component
In the first component combinations will be placed tinder the different polysemous senses of the lemma (i.e. the base of the combination) without examples.
In the case of collocations no definitions will be provided since collocations are b definition transparent constructions (d.Gouws 1989: 232).Y Transitional collocations and idioms will however be provided with definitions, and labels will be used to indicate nonstandard forms.
Fixed expressions where the lexical base does not semantically relate to any of the listed senses of the corresponding lemma, will be included under a separate expression component.polysemous senses will be ordered according to parts of speech.The primary model which will be followed is the BBl Dictionary of English Word Combinations.

The second component
The lack of adequate examples and the unusual nature of some of the examples in the existing Afrikaans standard dictionaries are often pomted out by dictionary reviewers and metalexicographers (d.Lombard 1992: 148-164).In this dictionary the current situation will be rectified.In the second component, typographically distinguished from the first, there will be an example for every combination mentioned in the first component.This will have an encoding function, especially for the secondary target group.Sentences from spoken and written Afrikaans will reflect real Afrikaans as it is currently used.The ideal will therefore be to use as many citations as possible; however, verbal illustrations will be used when the need arises to illustrate more than one information type in the same sentence (d.Gouws 1989: 233).There will be a direct relation between the examples in the second component and the labels used in the first component.
The arrangement of combinations in the first component of the article will most probably require that the potential users should use their linguistic intuition; a strict alphabetical arrangement (by using secondary keywords) of examples in the second component should make lighter demands on the dictionary reference skills of the secondary users, for whom this component is especially intended.

Conclusion
There is an important place in Afrikaans lexicography for a specialized collocation and phrase dictionary from which both mother-tongue speakers and advanced learners of Afrikaans can benefit.This dictionary should be compiled according to theoretical criteria, but the specific needs and skills of the target users should be taken into account. .