Článek popisuje metodu automatické identifikace deverbativ na základě slovníků a manuálně a automaticky anotovaných korpusů. Metoda je evaluována na novém testovacím korpusu, který je rovněž zveřejněn.
In this paper, we present an attempt to automatically identify Czech deverbative nouns using several methods that use large corpora as well as existing lexical resources. The motivation for the task is to extend a verbal valency (i.e., predicate-argument) lexicon by adding nouns that share the valency properties with the base verb, assuming their properties can be derived (even if not trivially) from the underlying verb by deterministic grammatical rules. At the same time, even in inflective languages, not all deverbatives are simply created from their underlying base verb by regular lexical derivation processes. We have thus developed hybrid techniques that use
both large parallel corpora and several standard lexical resources. Thanks to the use of parallel
corpora, the resulting sets contain also synonyms, which the lexical derivation rules cannot get. For evaluation, we have manually created a gold dataset of deverbative nouns linked to 100 frequent Czech verbs since no such dataset was initially available for Czech.