PDT-Vallex

The PDT-Vallex project aims at creating a Czech valency lexicon linked to real texts in various Czech corpora.  The valency lexicon PDT-Vallex has been built in close connection with the annotation of the Prague Dependency Treebank project (PDT) and its successors (mainly the Prague Czech-English Dependency Treebank project, PCEDT, and the Prague Dependency Treebank of Spoken Czech, PDTSC). It contains over 11000 valency frames for more than 7000 verbs which occurred in the PDT, PCEDT or PDTSC. It is available in electronically processable format (XML) together with the aforementioned treebanks (to be viewed and edited by TrEd, the PDT/PCEDT / PDTSC main annotation tool) , and also in more human readable form (see the links above and below). The main feature of the lexicon is its linking to the annotated corpora - each occurrence of each verb is linked to the appropriate valency frame with additional (generalized) information about its usage and surface morphosyntactic form alternatives.

Resources and access

Quick links (but please read below the basic facts):

PDT-Vallex is available in several forms. First and foremost, it is part of the Prague Dependency Treebank 2.0. After the extensions described in the PDT-Vallex PhD dissertation, and later published also as two books (please see our list of UFAL published books, and look for author "Uresova"). It has been also converted into a human-readable form in the pdf format. Lastly, it is also available in a browsable and partly searchable form with links to the texts which have been annotated with it (for more about the treebanks themselves, see PDT, PCEDT). It is also available in the original and pdf formats in the new LINDAT repository.

User interface and searchable version

The searchable version with examples from several Czech annotated corpora can be found in the LINDAT/Clarin user services and applications pages.

Related projects

EngVallex

PDT-Vallex is now complemented with EngVallex, valency lexicon for English, which follows the structure and labeling scheme of PDT-Vallex, but which is based on English PropBank frame files; it has been used for the tectogrammatical annotation of the English side of the Prague Czech-English Dependency Treebank (PCEDT 2.0).

CzEngVallex

CzEngVallex is a bilingual valency dictionary built over the PCEDT parallel treebank, which links verb senses and their arguments explicitly in the electronic version of the dictionary. It can also be downloaded here.

How to cite

If you make use of PDT-Vallex, please use the following text to cite: Urešová, Zdeňka; Štěpánek, Jan; Hajič, Jan; Panevová, Jarmila and Mikulová, Marie, 2014, PDT-Vallex: Czech Valency lexicon linked to treebanks, LINDAT/CLARIN digital library at Institute of Formal and Applied Linguistics, Charles University in Prague, http://hdl.handle.net/11858/00-097C-0000-0023-4338-F.

 @misc{11858/00-097C-0000-0023-4338-F,
 title = {{PDT}-Vallex: Czech Valency lexicon linked to treebanks},
 author = {Ure{\v s}ov{\'a}, Zde{\v n}ka and {\v S}t{\v e}p{\'a}nek, Jan and Haji{\v c}, Jan and Panevova, Jarmila and Mikulov{\'a}, Marie},
 url = {http://hdl.handle.net/11858/00-097C-0000-0023-4338-F},
 note = {{LINDAT}/{CLARIN} digital library at Institute of Formal and Applied Linguistics, Charles University in Prague},
 year  {2014} }