The Utilization of Parallel Corpora for the Extension of Machine Translation Lexicons

Jeanne Pienaar; G.D. Oosthuizen

doi:10.5788/7-1-975

The Utilization of Parallel Corpora for the Extension of Machine Translation Lexicons

Jeanne Pienaar Department of Computer Science, University of Pretoria, Pretoria, South Africa
G.D. Oosthuizen Department of Computer Science, University of Pretoria, Pretoria, South Africa

Keywords: alignment, bilingual corpora, corpus, extension, lexicon, machine translation, monolingual corpora, parallel corpora

Abstract

There has recently been an increasing awareness of the importance of large collections of texts (corpora) used as resources in machine translation research. The process of creating or extending machine translation lexicons is time-consuming, difficult and costly in terms of human involvement. The contribution that corpora can make towards the reduction in cost, time and complexity has been explored by several research groups. This article describes a system that has been developed to identify word-pairs, utilizing an aligned bilingual (English-Afrikaans) corpus in order to extend a bilingual lexicon with the words and their translations that are not present in the lexicon. New translations for existing entries can be added and the system also applies grammar rules for the identification of the grammatical category of each word-pair. This system limits the involvement of the human translator and has a positive impact on the time, cost and effort needed to extend a bilingual lexicon.

Published

1997-11-30

How to Cite

Pienaar, J., & Oosthuizen, G. (1997). The Utilization of Parallel Corpora for the Extension of Machine Translation Lexicons. Lexikos, 7(1). https://doi.org/10.5788/7-1-975

Download Citation

Issue

Vol. 7 (1997)

Section

Navorsingsartikels / Research Articles

Copyright of all material published in Lexikos will be vested in the Board of Directors of the Woordeboek van die Afrikaanse Taal. Authors are free, however, to use their material elsewhere provided that Lexikos (AFRILEX Series) is acknowledged as the original publication source.

Creative Commons License CC BY 4.0