Lexical Network of Word-Formation Relations in Czech

DeriNet is a lexical network which models word-formation relations in the lexicon of Czech. Nodes of the network correspond to Czech lexemes, while edges represent derivational links (relations between derivatives and their base lexemes) or links connecting compounds with their base words.

The present version, DeriNet 2.0, contains 1 million lexemes (sampled from the MorfFlex dictionary) connected by 808 thousand derivational relations and 600 links pointing from compounds to their base words.

Compared to previous versions, DeriNet 2.0 uses a new format and contains new types of annotations:

  • annotation of morphological categories (with all DeriNet lexemes),
  • identification of root morphs (in 250k lexemes),
  • semantic labelling (150k relations assigned five labels),
  • compounding (with 600 lexemes), and
  • so-called fictitious lexemes (as a proof of concept).

More details on the DeriNet 2.0 format can be found in Jonáš Vidra's et al. paper presented at the DeriMo 2019 workshop.

DeriNet 2.0 was released in May 2019. It is available in the LINDAT/CLARIN digital library under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License (CC-BY-NC-SA).

For older versions of the DeriNet data see here.

Online DeriNet Tools

DeriNet data can be searched online using two versions of DeriNet Search. DeriSearch v2 shows all pieces of information stored in the data, while DeriSearch v1 displays only derivational relations (not compounding relations). The data can be viewed online using DeriNet Viewer.



Related projects

Universal Derivations (UDer)

DeriNet 2.0 is a part of the Universal Derivations (UDer), a collection of harmonized derivational resources for multiple languages. The current version (UDer 0.5) contains 11 derivational resources for 11 different languages, all harmonized to the DeriNet 2.0 format. See the UDer page for more details.

Word-formation networks for other languages (created in the DeriNet-like format)

The following resources were created in cooperation with Poznan University of Technology:

  • DeriNet-style derivational networks for Czech, French, Polish, and Spanish created by a semi-supervised approach using a sequential pattern mining technique, as described in an article currently under review in the LRE journal: (four generated networks plus our hand-annotated samples, for individual licenses see README)
  • Polish Word-Formation Network v. 0.5 (under CC-BY-ND):
  • Spanish Word-Formation Network v. 0.5 (under CC-BY-ND):
    • by Mateusz Lango

Related publications: