Internationalisation, Localisation and Customisation Aspects of the

Abstract: TshwaneLex is the world's only lexicography software suite with which the entire lexicographic process, from initial compilation all the way to final product, may be conducted in the language of one's choice. This is possible thanks to various aspects of internationalisation, localisation and customisation that are built into TshwaneLex. These are discussed by means of examples drawn from a wide variety of projects and languages.
 
Keywords: INTERNATIONALISATION, LOCALISATION, MAINSTREAM LOCALISA-TION, DEVELOPMENT LOCALISATION, BLOWBACK LOCALISATION, CUSTOMISATION, TSHWANELEX, LEXICOGRAPHY, DICTIONARY, SOFTWARE, LANGUAGE INTERFACE PACK (LIP), CILUB?, ISIZULU, SESOTHO SA LEBOA, SETSWANA, SWAHILI, WELSH
 
Samenvatting: Aspecten van internationalisatie, lokalisatie en aanpas-baarheid in de woordenboektoepassing TshwaneLex. TshwaneLex is 's werelds enige lexicografische software suite waarmee het volledige lexicografische proces, van initi?le samenstelling tot en met het eindproduct, in een taal naar keuze kan worden uitgevoerd. Dit is mogelijk dankzij verschillende aspecten van internationalisatie, lokalisatie en aanpasbaarheid die in TshwaneLex werden ingebouwd. Die worden besproken met behulp van voorbeelden uit een breed gamma van projecten en talen.
 
Sleutelwoorden: INTERNATIONALISATIE, LOKALISATIE, HOOFDSTROOMLOKA-LISATIE, ONTWIKKELINGSLOKALISATIE, TERUGKOPPELINGSLOKALISATIE, AANPAS-BAARHEID, TSHWANELEX, LEXICOGRAFIE, WOORDENBOEK, SOFTWARE, LANGUAGE INTERFACE PACK (LIP), CILUB?, ISIZULU, SESOTHO SA LEBOA, SETSWANA, SWAHILI, WELSH


1.
Localising Software in Today's Global Village It should not come as a surprise that in today's global village, the localisation of software is picking up momentum.Just this April, for example, the software giant Microsoft launched its isiZulu LIP (Language Interface Pack) for Windows XP, which was followed in June by the LIP for Setswana.By the time this article will be in press, it is expected that the Afrikaans version will also have been released.What is true for these generic mass-produced applications has now also become a reality for the highly specialised field of lexicography.Indeed, TshwaneLex -the world's only truly off-the-shelf dictionary compilation software, and the flagship product of TshwaneDJe HLT -is now not only fully customisable, but also fully localisable.This thus means that the entire lexicographic process, from initial compilation all the way to final product, may henceforth be conducted in any language of one's choice.The purpose of this article is to highlight the main aspects that make this possible.

Internationalisation and Localisation: What are these?
The two key concepts 'internationalisation' and 'localisation' are summarised as follows by Schmitz (2006: slides 3 and 4): What is internationalization (I18N)?Developing a product or service in such a way that it will be easy to adapt to other markets (languages and cultures) Note the crucial difference between the first two words in each definition: 'developing' versus 'adapting'.Internationalisation (often abbreviated to just I18N -being the first and last letters with '18' representing the number of letters in-between, and pronounced 'I-eighteen') is thus what developers do once; while localisation (analogously abbreviated to L10N and pronounced 'L-ten') is what one has to do over and over.This brings Schmitz (2006: slide 6) to conclude: "the more stuff you push into I18N out of L10N, the less complicated and expensive the process becomes".Although Schmitz's conclusion is of course true, one may deplore the focus on the monetary aspect.Actually, this focus is the result of current 'mainstream localisation' efforts, summarised by Schäler (2006: slide 8) with the following notion: Increase return on investment (ROI) • IF there are markets rich enough to buy our products • THEN adapt our already developed products superficially to the requirements of these markets (with a minimum effort) AND sell them into these new markets for a similar price as the original product (there is no easier way to make money) Schäler (2006: slide 10) is indeed correct when he claims that "mainstream localisation efforts focus on short-term return on investment and ignore currently non-profitable regions".

Brief Internationalisation and Localisation Case Study: Microsoft's LIPs (Language Interface Packs) for South Africa
If one focuses on the South African market, and as mentioned at the outset, one sees that work has been done on three localised versions of the Windows XP operating system, by means of the creation of three LIPs.In the words of Rolfe (2002: abstract): A Language Interface Pack (LIP) is a package that allows users to install a particular language skin on top of their English operating system to provide them with an almost fully localized operating system (OS) User Interface (UI) for the chosen language.This is accomplished via Microsoft's Unicode-based Multilanguage User Interface (MUI) technology which allows the localization of resources for the most visible and most commonly used features of the operating system.
Although seemingly very impressive, what exactly are those "most visible and most commonly used features"?According to Rolfe (2002: slide 24) one needs to answer the following question: In other words, for LIPs the '20-80 rule' is used, meaning that only the 20% of the user interface that is used 80% of the time is being localised, and the above shows a partial list of the top-level elements defined to be localised.
In order to visualise -literally -what this means during actual use, Addendum 1 shows the start page of the Outlook Express e-mail client, supposedly in Setswana.Although this application was chosen by Ntaoleng Motaung when she proudly showcased the LIP for Setswana at AFRILEX 2006, one is immediately struck by the amount of untranslated strings.One does not even need to "go deep into the application" (Ntaoleng 2006) to find parts that are not in Setswana, as one is merely looking at the start page.Basics such as 'There is 1 unread mail message in your inbox' have only been partially translated, here to There is 1 molaetsa wa poso e e sa buisiwang mo bokosokgorogelon* [sic -should be bokosokgorogelong] ya gago.The reader will see that there are many more untranslated sections that are visible in the screenshot.Further note that even basic button texts such as 'Previous' and 'Next' are also displayed in English.Of course, there are translations in Setswana for these concepts, and as a matter of fact, in various other places of Windows XP these have indeed been translated.See in this regard for example Addendum 2, a screenshot of one of the first questions one has to answer in order to install the Setswana LIP.Here Morago stands for 'Previous' and E e latelang for 'Next'.
While the above observations could be viewed as 'unfortunate', one really starts to be concerned when one notices the high degree of inconsistencies.Simply clicking on the Simolola 'Start' button in Setswana reveals the pop-up window shown in Addendum 3, left.There is no reason why the capitalisation of Me 'My' should vary in the top-right corner: It is highly surprising that such basic errors and inconsistencies could have crept into the localisation effort as, with the right localisation tools, these could all easily have been avoided.In addition to this, however, it is clear that the translators and proofreaders did a very sloppy overall job, as exemplified in Addendum 5.In the string 'Add or Remove Programs', for example, the English 'or' ended up being glued to the Setswana translation gongwe 'or', resulting in the erroneous Fetola gongweor* tlosa Diporokeramo.(Also, and again, note the inconsistency in word-initial capitalisation here.) At the IATIS 2006 conference, Dwayne Bailey, Director of the Zuza Software Foundation, referred to Microsoft's half-baked LIPs as "merely paying LIP service to the South African languages" (Bailey 2006).Given Bailey supervised the creation of the complete, freely available OpenOffice.org 2.0 suite -which includes the word processor Writer, the spreadsheet component Calc, the presentation application Impress, the graphics package Draw, the database program Base, and the equations and formulae tool Math -and this in all eleven of South Africa's official languages, he of course more than anyone else has reason to speak.The current South African LIPs are an affront to the local languages, and in no way does this type of localisation show good practice for others to copy.

Mainstream, Development and Blowback Localisation
Against the backdrop just sketched, it is not surprising that localisation professionals like Schäler (2006: slide 27) conclude that the "short-term return-oninvestment cannot remain the only driving force behind the localisation effort".In this respect, the 'Global Initiative for Local Computing' (GILC) -with as motto Localisation is not an option -it is a fundamental right -was launched in September 2005, to move away from 'mainstream localisation' towards 'development localisation'.In the words of Schäler (2006: slide 13): The main driver of the development localisation efforts is the belief that a more inspirational, visionary and innovative perspective on localisation includes a variety of reasons to localise.Among them are: http://lexikos.journals.ac.za

•
Social reasons -the intent to bridge social divides; • Political reasons -the effort to provide equal access to electronic information;

•
Cultural reasons -the belief in the need for linguistic and cultural diversity;

•
Long-term investment reasons -the conviction that business needs to take a long-term approach and invest sensibly in new and emerging markets.Schäler (2006: slide 27) further envisages the future creation of a 'blowback effect', whereby custom solutions and technologies that are being launched in developing countries are being picked up by, or fed back into, the developed world.This future 'blowback localisation' would then allow for a genuine exchange or two-way traffic.
In hindsight, this future might already be with us when it comes to the field of lexicography software.When one studies the highly innovative internationalisation and localisation aspects of TshwaneLex, one realises that various useful principles developed in TshwaneLex could and should indeed be adopted elsewhere.In order to fully appreciate these aspects, however, it is also necessary to look at some customisation aspects of TshwaneLex.As such, the remainder of this article will proceed in reverse, from final product back to the heart of the software environment itself, using a variety of languages.

TshwaneLex in a Nutshell -Examples of Final Products
TshwaneLex is a professional lexicography software suite for the compilation of monolingual, bilingual and multilingual dictionaries, and for the publication of dictionaries in hardcopy, online and electronic formats.The application is currently in use by over two hundred users worldwide, from individuals to large organisations, and for over one hundred different languages.To date, over eighty people have received formal TshwaneLex training.

Customisation Aspects of TshwaneLex -From Final Products to the Compilation Environment
In Addendum 6 an example is shown of an existing monolingual dictionary, produced and placed online with TshwaneLex, for which both the dictionary data and the entire dictionary interface text are in the same language.In this case, a mother-tongue speaker of Sesotho sa Leboa is looking up the word matšulela '(traditional) grinding stones to grind grains' in the Pukuntšutlhaloši ya Sesotho sa Leboa ka Inthanete 'Explanatory Sesotho sa Leboa Dictionary on the Internet' (Mojela et al. 2004(Mojela et al. -2006)).To date, and probably surprisingly, this work remains the only fully monolingual African-language dictionary that is available on the Internet, allowing the user to consult a reference work in a digital dictionary environment that is truly and entirely in his/her language.
Although seemingly straightforward, what a monolingual dictionary does not reveal is the separation between actual, unique dictionary contents and repetitive metalanguage (such as parts of speech, domain labels, etc.).If this separation is done properly, dynamic metalanguage customisation can, for example, instantly 'kick in' while a user consults an electronic bilingual (or multilingual) dictionary.An in-depth discussion of the notion of 'dynamic metalanguage customisation' may be found in De Schryver and Joffe (2005); suffice it here to point out that this implies, amongst others, that a dictionary's metalanguage is 'generated' (or customised, adapted) in real time in the language of the dictionary user.This, again, remains a world's first, and is illustrated in Addendum 7 for Hillewaert et al.'s (2004Hillewaert et al.'s ( -2006) ) Kamusi ya Kiswahili-Kiingereza Katika Mtandao/Online Swahili-English Dictionary, a reference work in progress compiled with TshwaneLex.In the screenshot shown in the background, a person browsing the dictionary in Swahili will (for the example shown in this case) see the parts of speech labelled as kimilikishi nomino and kivumishi, while the cross-reference marker text will be displayed as Mzizi.For a person who browses the same dictionary, but now in English, as shown in the screenshot in the foreground, these same metalanguage strings will automatically be adapted to 'possessive pronoun', respectively 'adjective', for the parts of speech, and 'Root' for the cross-reference marker text.
Apart from separating dictionary contents from metalanguage, good lexicography software should also clearly separate these data from the structure of the articles.In technical terms this means that the contents must be 'governed by' and thus 'conform to' a DTD (Document Type Definition, or dictionary grammar), with this DTD again markedly distinct from the actual formatting (or style) of the data.An in-depth discussion of the fully customisable and built-in TshwaneLex DTD and linked Styles system may be found in Joffe and De Schryver (2005); suffice it to point out for the present discussion that all aspects of the grammar and formatting may be created in the language of the user's choice.This is illustrated in Addendum 8, where a Cilubà-French Dictionary (Kabuta et al. 2006) is being compiled within TshwaneLex.The Tree View on the left is 'converted' into one of any number of displays in the Preview Area on the right.What is important to note here is that all the labels for the components of the Tree View are entirely customisable (and have been customised) by the lexicographer: Ngumvwìkìlà stands for 'Sense', Munyàku for 'Combination', Dikùdimuna for 'Translation equivalent', Cileejilu for 'Example', etc.
http://lexikos.journals.ac.zaThis tripartite division of the various levels of the data is also reflected in the layout of the TshwaneLex GUI (graphical user interface).As may be seen in Addendum 9, (1) the bottom-left quadrant is the area where the unique dictionary contents are being typed in (into input boxes) and where the repetitive metalanguage is being selected (from, for instance, drop-down menus), (2) the top-left quadrant is where the Tree View is being constructed in conformance with the DTD, and (3) the right half is the Preview Area displaying a possible output with any of a number of styles applied.Each of these levels is not only seamlessly linked to and interacts with the other levels, each is also fully customisable, including with regard to the language used.In Addendum 9, all customisation aspects were created in Cilubà.Also (and already) visible in this same screenshot, is the fact that all the text strings of the GUI itself have been localised.This, however, is the topic of the next section.

Localisation Aspects of TshwaneLex -Manipulating the Compilation Environment
Typical tools for software localisation include terminology management systems, translation memory applications, localisation software, and project management tools (Schmitz 2006: slide 24).In TshwaneLex, however, all of these were brought together into a single powerful built-in Localisation editor.All relevant text strings that appear throughout the various menus, dialogue boxes, messages, tabs, buttons, the status bar, etc. are automatically presented in the Localisation editor.This can be seen in Addendum 10, where localisation of TshwaneLex is ongoing for Welsh.Also note that all strings where the English term 'View' appears were brought together in this example, which enables the translator to make sure that the translated terminology is consistent.One simply has to recall the haphazard translation of the word 'Internet' throughout the isiZulu LIP for Windows XP, for example, to realise that being able to bring related material together while translating is indeed an added advantage.
Furthermore, another powerful feature that helps ensure consistency and contextual correctness is an indication of where in the GUI each particular text string appears.This is shown in the column headed by 'Key' in Addendum 10.This first column, as well as the columns with the original and translated strings, can also be sorted by simply clicking on the respective headers, which again enables one to bring related material together.
Any number of localised versions of the GUI can be prepared and each immediately becomes available within TshwaneLex.Further note that all the text strings are kept in a single file per language, which for instance means that all the text of each localised version can easily be spellchecked in one pass.With the latter, non-words such as for example gongweor*, which, as discussed above, found its way into the Setswana LIP for Windows XP, will most definitely be instantly picked up.
Particularly handy, finally, is the fact that the results of the localisation can be seen in real time within TshwaneLex itself by simply clicking 'Apply [Partial]'.As an illustration, random examples of localised sections of TshwaneLex are shown in Addendum 11, for both Welsh and Cilubà.

Internationalisation Aspects of TshwaneLex -From the Compilation Environment to the Program Code
In order to achieve the localisation aspects discussed in the previous section, TshwaneLex of course had to be internationalised.Simply speaking, this internationalisation was achieved through a strict separation between the actual (C++) program code on the one hand, and all the textual data that appears in the GUI of TshwaneLex on the other.This internationalisation, in combination with the fact that Unicode is supported on all levels throughout the application, means that TshwaneLex can now easily be adapted to other markets, and this by the software users themselves.Users can even select/create localised terms suitable to their purposes.There is, however, an additional advantage to the internationalisation of TshwaneLex from a developer's angle, namely the fact that it was an important step towards being able to generate multiple applications (TshwaneLex 1.0, TshwaneLex 2.0, TshwaneTerm 1.0, TshwaneTerm 2.0, etc.) from the same code base.

Conclusion: Candidates for 'Blowback Localisation'
Implicit in this article was the conviction that localisation in lexicography can only be truly successful if, in addition to the localisability of the graphical user interface, the dictionary contents on the one hand, and the building blocks to bring those contents together on the other, have such a degree of customisability that the (meta)language can also be adapted on those levels.With specific regard to the internationalisation and localisation study of the lexicography software TshwaneLex, the following elements appear to be good candidates for 'blowback localisation': • put the localisation in the hands of the users themselves, through a built-in localisation editor, without the need for any other extra software; • enable all the text strings to be searched for, brought together, and sorted in various ways, so as to ensure consistency in the translations; • make sure all the translated textual material is always seen in context; • ensure easy spellchecking of all the translated material, and this by means of just one pass; • allow users to add and change any number of languages, in addition to the default one(s); • consider the instant visualisation of the result.
http://lexikos.journals.ac.zaSummarising, and returning to the software under discussion: While one is using TshwaneLex to localise TshwaneLex, the emerging localised version of TshwaneLex appears in front of the translator's eyes -arguably the ultimate in localisation.

Addendum 1 :
Microsoft's Outlook Express with the Setswana LIP (Note the large amount of untranslated material) Addendum 2: Installing Microsoft's LIP for Setswana (Note the Morago 'Previous' and E e latelang 'Next' buttons, and compare with bottomright of Addendum 1) Regrettably, such inconsistencies are far too numerous throughout Windows XP, as is exemplified in Addendum 4 for the isiZulu LIP, where one notices as many as three different spellings of 'Internet': intanethi in the top-left corner, Internet in the middle, and Intanethi (as well as again Internet on the same line!) in the bottom-right.According to Mariëtta Alberts (personal communication at AFRILEX 2006, 7 July 2006), the three Technical Committees: Terminology Development of PanSALB's National Language Bodies for isiZulu, Setswana and Afrikaans were requested to verify the terminology of the LIP glossaries for the said languages.All three committees complained about inconsistencies in the translations as well as the strange usage of capitalisation.PanSALB therefore stated clearly that its National Language Bodies cannot authenticate the LIPs prior to proper feedback from stakeholders.
• Ditokumente tsa Me 'My documents' • Ditshwantsho tsa Me 'My Pictures' • Khomphiutara ya Me 'My Computer' • Mafelofaratlhatlha a me 'my Network Places' Ironically, the same problem reappears in the isiZulu version when clicking on the Qala 'Start' button, as can be seen in Addendum 3, right: