PEDANT: Parallel Texts in Göteborg

  • Daniel Ridings Språkbanken, Institutionen för svenska språket, Göteborgs universitet, S-412 98, Göteborg, Sweden
Keywords: sgml, parallel corpora, morphosyntactic encoding, lemmatization, multiword units, compound words, internet access


The article presents the status of the PEDANT project with parallel corpora at the Language Bank at Göteborg University. The solutions for access to the corpus data are presented. Access is provided by way of the internet and standard applications and SGML-aware programming tools. The SGML format for encoding translation pairs is outlined together. The methods allow working with everything from plain text to texts densely encoded with linguistic information.