PCEDT & Multilingual Corpora

Prague Czech-English Dependency Treebank

The Prague Czech-English Dependency Treebank is a manually annotated parallel, aligned treebank built above the Penn Treebank - Wall Street Journal text collection. It comes in two versions. The current version has over 1.2 million running words in almost 50,000 sentences for each language part. Each language part is enhanced with a comprehensive manual linguistic annotation in the PDT 2.0 style (Prague Dependency Treebank 2.0). ... [learn more]

HamleDT: HArmonized Multi-LanguagE Dependency Treebank

HamleDT is a compilation of existing dependency treebanks (or dependency conversions of other treebanks), transformed so that they all conform to the same annotation style. ... There are as many as 29 treebanks integrated in HamleDT at this moment. A subset of the treebanks whose license terms permit redistribution is available directly for download from us. ... [learn more]


Other Parallel and/or Multilingual Corpora

Project Tags
Czech Malach Cross-lingual Speech Retrieval Test Collection Corpora, Data, Information Retrieval, Multilingual, Speech Retrieval
CzEng Corpora, Data, Machine Translation, Multilingual
CzEngVallex - Czech and English verbal valency Annotations, Corpora, Data, Lexicons, Machine Translation, Multilingual, Semantics, Taggers
Deep Universal Dependencies Annotations, Coreference, Corpora, Data, Morphology, Multilingual, Multiword Expressions, Semantics, syntax, Valency
HamleDT Annotations, Corpora, Data, Multilingual, Parsers
HindEnCorp Corpora, Data, Machine Translation, Monolingual, Multilingual
Hindi Visual Genome Corpora, Data, Machine Translation, Multi-modality, Multilingual
Interset Corpora, Data, Morphology, Multilingual, Taggers, Tools
Lindat KonText Annotations, Corpora, Data, Monolingual, Multilingual, Tools
Multilingual Corpus Annotation as a Support for Language Technologies Annotations, Coreference, Corpora, Data, Discourse, Information Structure, Multilingual
PAWS (Parallel Anaphoric Wall Street Journal) Annotations, Coreference, Corpora, Data, Linked data, Multilingual
Prague Czech-English Dependency Treebank Annotations, Corpora, Data, Lexicons, Linked data, Multilingual, Valency
Prague Czech-English Dependency Treebank 2.0 Coref Annotations, Coreference, Corpora, Data, Linked data, Multilingual
Prague Database of Spoken Language 1.0 Annotations, Corpora, Data, Dialog, Multi-modality, Multilingual, Speech Recognition, Speech Retrieval
QT21 Corpora, Data, Lexicons, Linked data, Machine Learning, Machine Translation, Multilingual, Semantics, Tools
Slovakoczech NLP workshop Annotations, Coreference, Corpora, Data, Dialog, Discourse, Information Retrieval, Information Structure, Lexicons, Linked data, Machine Learning, Machine Translation, Monolingual, Morphology, Multi-modality, Multilingual, Multiword Expressions, Parsers, Publications, Semantics, Speech Recognition, Speech Retrieval, Spellcheckers, Taggers, Tools, Valency
UFAL Medical Corpus Corpora, Data, Machine Translation, Multilingual
Universal Dependencies Annotations, Corpora, Data, Morphology, Multilingual, Parsers
W2C Corpora, Data, Multilingual