A 38 Million Words Dutch Text Corpus and its Users

J.G. Kruyt, M.W.F. Dutilh

Abstract


The use of text corpora has increased considerably in the past few years, not only in the field of lexicography but also in computational linguistics and language technology. Consequently, corpus data and expertise developed by lexicographical institutions have gained a broader scope of application. In the European context this has led to a revised view of corpus design. In line with these developments, the Institute for Dutch Lexicology (INL) has since 1994 been providing external access to steadily improving corpora via Internet. In August 1996, the <i>38 Million Words Corpus</i> was available for consultation by the international research community. The present paper reports on the characteristics of this corpus (design, text classification, linguistic annotation) and on its use, both in dictionary projects and in linguistic research. In spite of limitations with respect to corpus design, the INL corpora accessible via Internet have proved to meet external needs. By providing these facilities, the INL has acquired a much broader experience in corpus-building than before, which is essential for new, internal dictionary projects. Giving external access to corpus data which was developed primarily for internal purposes, may be profitable for all parties involved.

 


Keywords


large electronic dutch text corpus, corpus design, text classification, topic, publication medium, linguistic annotation, on-line access via internet, corpus users

Full Text:

PDF


DOI: https://doi.org/10.5788/7-1-982

Refbacks

  • There are currently no refbacks.



ISSN 2224-0039 (online); ISSN 1684-4904 (print)

Creative Commons License CC BY 4.0


Powered by OJS and hosted by Stellenbosch University Library and Information Service since 2011.


Disclaimer:

This journal is hosted by the SU LIS on request of the journal owner/editor. The SU LIS takes no responsibility for the content published within this journal, and disclaim all liability arising out of the use of or inability to use the information contained herein. We assume no responsibility, and shall not be liable for any breaches of agreement with other publishers/hosts.

SUNJournals Help