- Lexicographical Resources in A Multilingual Environment : An Orientation

This article considers dictionaries as lexical infonnation / knowledge sources to be derived from a deeper, underlying, lexical database. These dictionary-tokens or -instantiations are inter alia specified by the users' needs. As a case in point of such a derivation meeting the needs of a multilingual society, a bidirectional bilingualleamer dictionary is presented. Specific tools, such as editors with reversal function, and models, such as the hub-and-spoke model, are discussed as means to function within the lexicographical infrastructure of a multilingual society.


Introduction
Being asked by the organizers of this semin~ to deal with the question 'which dictionaries are needed to fill the needs of a multilingual country such as South Africa', I have preferred to answer this question in a fairly general way.First of all because of the fact that I can not give specific answers to such a question myself.It is for the speakers of this country who come after me to deal with needs and preferences of the respective linguistic communities within South Africa.
Secondly, I have considered it much more informative for you (and feasible for me) to provide you in a first instance with a general and in a sense 'empty' framework for the construction and organization of a multilingual infrastructure.I do believe that such a framework is needed as a background against which to situate, compare and spe6ify the diverse, concrete needs, preferences and products.In a sense what is presented here is in line with the frame approach I have been advocating in other publications of mine (see e.g.Martin 1994), providing people with slots which they have to specify and fill out themselves. 1.

Dictionaries
In accordance with the general approach just advocated, I will start from a very broad and even traditional definition of what is called a 'dictionary'.Defining it as: It a book It containing information on (aspects and / or parts of) the vocabulary It of one or more languages It by making use of two dimensions: -a macrostructure -and a microstructure From the above it will have become clear that what is meant here is a printed dictionary and that the distinction between l macro-and microstructure (the The seminar was held on 12 April 1996 in Stellenbosch under the title Lexicography as a Financial Asset in a Multilingual South Africa. vertical and the horizontal dimension) -although in itself an artificial convention -will serve both to bring together and to differentiate between such objects as e.g.explanatory monolingual dictionaries, pronouncing dictionaries, thesauri, frequency lists, monodirectional bilingual dictionaries, terminological dictionaries and encyclopedias, to name but a few.

Different Kinds of Dictionaries
Although the abc;>ve-mentioned objects all can be said to belong to the 'type' dictionary, this does not mean that we can not make any distinctions between them as 'tokens'.Following Geeraerts 1984 in this respect, we will argue that parameters which characterize dictionaries are linked to .. the selection and / or the presentation of .. their macro-and / or their microstructure In other words, all differences between 'dictionary-tokens' will be differences between either the selection and / or presentation of either their macro-and / or their microstructure.Distinct surface markers or indicators to differentiate between instantiations of the 'type' dictionary are then (also see AI-Kasimi 1977, quoting MalkieI1959): .. scope (both macro-and microstructurally speaking: what kind of language population is the dictionary aiming at?, which data categories does it want to account for?) .. linguistic perspective (e.g.does the dictionary describe language synchronically or diachronically?).. entry arrangement (e.g.alphabetical, reverse alphabetical, chronological, onomasiological) .. number of languages (one or more) .. user group(s) (this 'marker' is, of course, of an other order, namely based on function, rather than on features, this way it precedes or supersedes the already mentioned ones, user group(s) implying a certain scope, perspective etc.; it is kept here both because of its high degree of relevance and its relative distinctness).
By means of parameters such as the above one can now characterize different kinds of dictionaries.It would be of great help both for users and for makers of dictionaries to have / make this information available.Not so much in order to come to a dictionary categorization / classification per se -the above parameters moreover much more lead to a parametrization than to a fixed or rigid classification -but because such a characterization / specification is of great help both in the planning, the design, the use and the evaluation of dictionaries.
In other words, a characterization of a dictionary such as the English_ Dutch Van Dale as .. a dictionary .. meant for native speakers of Dutch .. for them to understand and translate .. contemporary English .. of both spoken and written 'texts' .. of a general nature (see Martin-Tops 1989 2 , Introduction, p. 13) is not just a superfluous metalexicographical minor detail, but a useful and necessary datum in the deSCription of the lexicographical landscape of a language.

3.
The Changing Lexicographical Landscape: Dictionaries venus Lexical Databases One of the striking differences between lexicography nowadays and lexicography a couple of decades ago, is, among others, that, with the advent of computers in lexicography, both an evolution in breadth and one in depth has taken place.As to the evolution in breadth: a shift can be observed from just one object or product of interest, the printed or paper dictionary, to a whole range of them.In Martin 1995a in addition to the traditional paper dictionary the following objects of interest are listed: .. computer-aided / based dictionaries (i.e.dictionaries developed with the help of computers) .. machine-readable dictionaries (i.e.dictionaries of which a machinereadable version exists) .. electronic dictionaries / term banks (dictionaries of which the data are electronically stored and which dispose of retrieval software) .. machine dictionaries (dietionaries for NLP-systems, showing, as a rule, a higher degree of formalization of the data than is the case with paper dictionaries) .. lexical databases (dictionaries in database form, with a high degree of formalization, and retrieval software) .. AI-lexicons (comparable to lexical databases, involving as an extra next to lexical knowledge also knOWledge about specific State-of-Affairs) Actually this list is open-ended and other 'types' such as hypertext dictionaries, multimedia dictionaries etc. could be added; however what is important, is that, next to this broadening of the field, alsp a 'deepening', an evolution in depth, has .takenplace..As a rule, one can s~y that nowadays underlying a printed dictionary there is a lexical database from which it is derived, the latter also being' deeper', in that it is both more explicit and yet more general, i.e. not directed to just one user group.Eight years ago in 1988 at the 3rd Euralex Conference in Budapest AI and Martin accordingly could claim therefore that the best way to come to user friendly dictionaries was for them to be derived from non-user oriented lexical databases.To quote the above-mentioned article: 'We would like to suppose that every new dictionary that is published nowadays has been derived from an underlying database (which can, of course, be more or less sophisticated).Furthermore we suppose that different types of users are in need of different types of dictionaries.( ... ) This does not imply, however, that one has to build up a completely separate database for each type of dictionary.For several reasons it is preferable to set up one "subjacent", fundamental database which is not useroriented, and to derive from it as many user-oriented front-end databases as there are types of dictionaries.Editing, updating and further completion of the bare lexical data are activities which -in a certain sense -should be unrelated to the final products a publisher may have in mind.They concern the fundamental, non-user-oriented database.Front-end databases on the contrary, are typically product-oriented.They contain specific selections which depend entirely on the needs of the users for whom the dictionaries involved are intended" (Martin-AI 1990:393).
Figure 1 (taken from Heid 1991) clearly illustrates this change in focus.In the centre of the figure the lexical database is to be found whose primary objective is to represent lexical knowledge.This knowledge is acquired and imported through interfaces from sources such as text-corpora, other dictionaries, databases, human informants) and the like (see left-hand side of figure).At the right-hand side concrete products (dictionaries) are to be found which are exported via interfaces, derived from the central lexical database and oriented towards specific users, thus leading to specific applications.What is needed first of all therefore as a lexicographical resource in my opinion, is not so much one or more concrete products to relieve the specific needs of certain language users with, but much more the underlying database with which to relieve not only these, but also other possible and important needs.

4.
'Dictionaries' in a Multilingual Environment However true the above observation may be, it does not exempt one from having to conceive the possible derivations and products, the front-end databases that, in our case, might be of particular interest in a multilingual Society.
It is rather obvious that next to lexicographical products which are to be found in a monolingual society (such as monolingual dictionaries e.g.), a mul-tilingual society needs other products too, such as bilingual or translation dictionaries.Less obvious it may seem that a multilingual environment actually may require a special kind of translation dictionary, e.g. a multifunctional one serving the needs of both the source and the target language users.Indeed' given the fact that a country such as the new South Africa has eleven official languages and so fifty five different language pairs, producing mono-func_ tional / monodirectional dictionaries (and so at least four dictionaries per language pair) (see Kromann et al. 1991) would involve two hundred and twenty different dictionaries.Clearly this would become too costly an affair.Moreover if a multilingual society is really to function properly, it should dispose of learner dictionaries, preferably of bilingual bidirectional learner dictionaries so that speakers of different language groups may learn each other's language.This does of course not alter the point-of-view that there should be just one database (per language) to derive all these different products from.On the contrary the more variety the different front-ends show, the more the need and usefulness to start from just one common single non-oriented lexical database will be felt.
By way of example and because of the fact that the notion bidirectional learner dictionary' is fairly new, I will dwell somewhat longer on this concept (see Martin 1987 andHannay-Wekker 1995).Leamer dictionaries are not to be seen here as primary teaching tools but as specific kinds of dictionaries with a supporting role in the langUage learning process.As such they may contribute to improving the learner's receptive and productive knowledge of the target language.However where Martin 1987 still starts from a monodirectional learner dictionary, Hannay-Wekker 1995 present a bidirectional one, implying that the dictionary should function for two language groups at the same time, serving as both an LTL} and as an L}-L:z dictionary.
An example entry of such a (experimental) bidirectional English-Dutch dictionary follows (see Hannay-Wekker 1995:28).It shows the following structure: 1. Headline (containing the headword and basic information about it) 2. Profile (containing a set of meaning descriptions and basic coded information on these translations, so to be able to make a choice between given possibilities) 3. Frame (listing the most frequent discoursal, syntactic, collocational and prepositional environments of the head word organized according to distinctions in the profile field) 4. Translations (including frequent contextual occurrences of the headword plus their translations) 5. Notes (providing information on synonyms and usage) 6. Expressions (comprising idioms, pragmatic formulas and proverbs) The slots: 'headline' and 'profile' are obligatory, the other ones are optional.--------------------------------------------------------- As stated before a bidirectional dictionary is multifunctional.In the above case of a bidirectional learner dictionary English-Dutch it is meant to be (a) a productive dictionary of English for the speaker of Dutch (b) a receptive dictionary of Dutch for the speaker of English (c) a productive dictionary of Dutch for the speaker of English Clearly then as some of the information will be primarily meant for one type of language user and even be superfluous for the other (see e.g.Frame, meant for TI..-speaker only), it is necessary for such a dictionary to have a clear structure allowing for different paths to be followed.Figure 2 gives an idea of how this can be achieved.

Aspects (of the Construction) of a Multilingual Lexicographical Infrastructure
After having stressed the importance of theoretical insights both for the representation (see section 3) and for the application of lexical knowledge (see section 4), in this section I would like to take into account some aspects of the implementation side too, paying some attention to the pragmatics of setting up a multilingual lexicographical infrastructure and so dealing with such issues as general objectives, tools, models and actors.

General Objectives
I take for granted that the set-up of a multilingual infrastructure in a multilin_ gual environment as such is not to be questioned: such a structure is simply to be regarded as a basic social benefit for the environment in question.However I hold the view that the multilingual infrastructure one should aim at, should lead to the production of lexicographical products (mono / bi / multilingual) # of a high-level quality # within a coherent, anticipatory framework (thus superseding short term planning) # on an economically justified basis.
This should imply multifunctional, re-usable and linkable resources.Such resources presuppose appropriate tools, one of which, namely OMBI, will be presented in the next section as a case-in-point.v.v., Arabic-Dutch v.v., Polish-Dutch v.v.etc. need to be mentioned.
However not only is it the Committee's task to have concrete products realized, but also to see to it that, if needed, adequate lexicographical tools and infrastructure are provided for., The construction of OMBI is to be sitUated within this second domain, aiming at providing lexicographical teams with a generic and powerful editing tool.
--Actually OMBI's main characteristics come down to the following: it functions as an editor which is generic having importing and exporting facilities and the power to reverse lexical databases, trying to do so in as accurate as possible a way As the last aspect is the most innovative I will briefly deal only with this characteristic.
The fact that OMBI can reverse translational relations and directional databases in general makes it rather unique.While the editing function is busy creating a bilingual database X « Y, and as such taking in translations from X to V, OMBI simultaneously stores the reversed counterparts, thereby building a reverse database Y « X.The end result is a non-directional bilingual database, from which databases and / or dictionaries in both directions can be automatically derived at a subsequent stage.
In order for the process and outcome of reversal to be non-trivial, the tool should not merely state that if word form x is a translation of word form y, then word form y is a translation of word form x. This is in many cases not only too limited a conclusion, but also a wrong one: only rarely is translation a straightforward symmetrical relation between word forms.
The first, highly important observation about translation relations is that it is not words that are translated into other words,'but rather words in a specific meaning.The English word horse is a translation of the Dutch word paard, but only in the meaning of the latter as [certain animal], not in its meaning [certain chess piece].This insight has had a fundamental influence on the architecture of the databases that OMBI builds.The database distinguishes between Form Units or PUs (word forms) and Lexical Vnits or LVs (meanings): every Form Unit (e.g.horse) can have one or more meanings (e.g.l-[certain animal], 2-[certain chess piece], etc.); only a LV (which always belongs to an accompanying PU) can be translated by a LV into another language (see fig. 3).
The second important observation is that translation, and reversal of translation in particular, only holds if certain conditions are met.In OMBI the translation relation is analysed into four relevant parameters that influence reversibility, and which therefore have to be specified and taken into account in 'calculating' whether the reversal of the relation is valid or not.The four parameters are the following: conceptual equivalence pragmatic contrast variant status lexicalization status It is not the place here to discuss the OMBI-calculus (see e.g.Martin-Tanun 1996 in this respect), however seeing some of its results may give one an idea of how it works and of the role it could play in the establishment of both a coherent and sound and yet economical lexicographical infrastructure.

Models
An action plan for a multilingual infrastructure not only involves a delineation of objectives and of tools to realize them, also models, in the sense of overall frameworks and / or approaches, are needed. - In this respect I would like to briefly present a model which may be of articular interest.given a multilingual environment, namely the hub-and-;poke model (see Martin 1995 andMashamaite 1995).The hub-and-spoke model actually is inspired by air traffic organization in which certain airports (hubs) act as centres.to and from which flights from other airports (spokes) operate.The key contention of this model for lexicography is that •several bilingual dictionaries can be made by linking the lexical items of the spoke languages (source languages selected for the enVisaged bilingual dictionary) to those of a common hub language (serving as the target language in the language pair).Suppose e.g. that one has a region in which English, Lesotho sa Leboa and Tshivenda are accepted as official languages, then what actually should be provided for are three lexicons which are linked to each other forming a hub-and-spoke configuration (see fig. 4).
The linking itself consists in supplying for and making explicit the kind of relationship there exists between the lexemes of the spoke languages and their equivalent hub lexemes.This equivalent relationship is rendered along the lines outlined in the preceding section, making use of the four parameters mentioned there, namely: conceptual equivalence, pragmatic equivalence, lexicalization status and variant status.So e.g.'setimela' in Lesotho sa Leboa could be linked to 'train' in English through .. conceptual equivalence = complete .. pragmatic differences = nil .. lexicalization status = lexicalized .. variant status = main so that it could be reversed without any restrictions: train = setimela, and that it could be linked to Tshivenda 'tshidimela' which is linked in the same way to English 'train' as is 'setimela'.
The consequence of applying this model in a multilingual environment such as South Africa with eleven official languages, would drastically reduce the work needed for the production of bilingual dictionaries.Instead of some hundred (actually 110) bilingual dictionaries to be made without any co-ordination at all, just ten lexical databases would need to be constructed and linked up all in the same way with a central (hub) lexical database so that (a) Spoke and hub can be reversed (b) spokei and spokEJ can be linked (and reversed) Of course both calculi will involve blockings such as they will need modifications and gap-filling as well.Yet it is my contention that the model leads to high quality, consistency and economy.

Actors
It will have become clear from the preceding that the setting up of a lexicographical infrastructure is a complex process involving many actors such as lexicographers, metalexicographers, publishers, subsidizers, (official) language planners, not to forget the users.Each of them in turn show different subtypes and categories.The exact role of each of them is difficult, not to say impossible to define without knowledge of the exact contextual factors.May it therefore suffice here for me to state that next to the main issues already dealt with one action not yet mentioned here is of primary importance for the success of the whole undertaking, namely that of the co-ordination of all activities and of aU actors involved.The fact that this very Seminar is set up within this spirit of coordination and co-operation, bringing together representatives and experts from several fields and domains, is however proof of it that exactly this aspect has not been lost sight of. ~-----------------~-===~ acronym based on the Dutch word group 'Qmkeerbare Bilinguale Lexicale Databanken' (= Reversible Bilingual Lexical Databases), is an editor which has been developed during the academic year 1994-1995 by the Dutch software house SERC (= Software Engineering Research Centre, Utrecht, The Netherlands) under the auspices of the CLW -a committee of which I happen to be the chairman -(Commissie voor Lexicografische Verta~oorzienin gen = Committee for Lexicographical Translation Resources).This Committee is an intergovernmental body of lexical experts set up in 1993 by the Ministry of Education and Science of both Flanders and the Netherlands in order to improve and stimulate the production of bilingual dictionaries and lexical databases with Dutch as a source or target language.The Committee has launched up till now several lexicographical projects which are, commercially speaking, non-viable, yet of great social relevance.In this respect projects such as Turkish-Dutch

Figure
Figure 1: Pro•cesses in Lexicography