Notes on Compiling a Corpus- Based Dictionary

František Čermák

doi:10.5788/20-0-156

František Čermák Institute of the Czech National Corpus, Faculty of Philosophy, Charles University, Prague, Czech Republic

Keywords: MONOLINGUAL DICTIONARIES, CORPUS LEXICOGRAPHY, SYNTAGMATICS AND PARADIGMATICS IN DICTIONARIES, DICTIONARY ENTRY, TYPES OF LEMMA, PRAGMATICS, TREATMENT OF MEANING, POLYSEMY, CZECH

Abstract

ABSTRACT: On the basis of sample analysis of a Czech adjective, a definition based on the data drawn from the Czech National Corpus (cf. ÄŒermÃ¡k and SchmiedtovÃ¡ 2003) is gradually compiled and finally offered, pointing at the drawbacks of definitions found in traditional dictionaries. Steps undertaken here are then generalized and used, in an ordered sequence (similar to a work-flow ordering), as topics, briefly discussed in the second part to which lexicographers of monolingual dictionaries should pay attention. These are supplemented by additional remarks and caveats useful in the compilation of a dictionary. Thus, a brief survey of some of the major steps of dictionary compilation is presented here, supplemented by the original Czech data, analyzed in their raw, though semiotically classified form.OPSOMMING: Aantekeninge oor die samestelling van 'n korpusgebaseerde woordeboek. Op grond van 'n steekproefontleding van 'n Tsjeggiese adjektief, word 'n definisie gebaseer op data ontleen aan die Tsjeggiese Nasionale Korpus (cf. ÄŒermÃ¡k en SchmiedtovÃ¡ 2003) geleidelik saamgestel en uiteindelik aangebied wat wys op die gebreke van definisies aangetref in tradisionele woordeboeke. Stappe wat hier onderneem word, word dan veralgemeen en gebruik in 'n geordende reeks (soortgelyk aan 'n werkvloeiordening), as onderwerpe, kortliks bespreek in die tweede deel, waaraan leksikograwe van eentalige woordeboeke aandag behoort te gee. Hulle word aangevul deur bykomende opmerkings en waarskuwings wat nuttig is vir die samestelling van 'n woordeboek. Op dié manier word 'n kort oorsig van sommige van die hoofstappe van woordeboeksamestelling hier aangebied, aangevul deur die oorspronklike Tsjeggiese data, ontleed in hul onbewerkte, alhoewel semioties geklassifiseerde vorm.Sleutelwoorde: EENTALIGE WOORDEBOEKE, KORPUSLEKSIKOGRAFIE, SINTAGMATIEK EN PARADIGMATIEK IN WOORDEBOEKE, WOORDEBOEKINSKRYWING, SOORTE LEMMAS, PRAGMATIEK, BEHANDELING VAN BETEKENIS, POLISEMIE, TSJEGGIES