Lexical Network of Derivational Word-Formation Relations in Czech
The lexical network DeriNet captures core word-formation relations on the set of around 970 thousand Czech lexemes. The network is currently limited to derivational relations because derivation is the most frequent and most productive word-formation process in Czech. This limitation is reflected in the architecture of the network: each lexeme is allowed to be linked up with just a single base word; composition as well as combined processes (composition with derivation) are thus not included.
The network was initialized with a set of lexemes whose existence was supported by corpus evidence. Derivational links were created using three sources of information: links delivered by a tool for morphological analysis, links based on an automatically discovered set of derivation rules, and on a grammar-based set of rules.
The current version of Derinet is 1.4 released in March 2017. It is available under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License (CC-BY-NC-SA).
Slides from our talk on December 15, 2014 (synchronized with release 0.9):