We present the Czech Court Decisions Dataset (CCDD) -- a dataset of 300 manually annotated court decisions published by The Supreme Court of the Czech Republic (SC) and the Constitutional Court of the Czech Republic (CC).
CCDD contains 150 documents published by the Supreme Court of the Czech Republic in 2012. We selected them randomly with respect to their distribution over senates. CCDD contains other 150 documents published by the Constitutional Court of the Czech Republic from 2004 to 2012.
All 300 documents in CCDD are manually annotated using the following entities:
In addition, a court decision reference are linked with an institution entities if the institution issued the court decision. Links between applicabilities and act references are not annotated. Each applicability entity follows an act reference.
For manual annotation we used a web-based annotation tool Brat. Annotators mark entity occurrences and label them with an appropriate tag. Then they mark relations between court decisions and institutions if they appear there.
We did a single annotation of 300 court decision. However, to get the inter-annotator agreement (IAA), we selected 15 random documents from the dataset and annotated them by three independent annotators. In average, the annotators marked 551 institutions, 258 references to court decisions, 402 references to acts and 42 applicabilities. We used the Fleiss' kappa to calculate the agreement. We report
Table below presents statistics on the 300 annotated documents averaged over 10 cross-validation folds:
|SC||# of tokens||train||43,117||11,074||1,262||12,425||332,535|
|# of entities||train||3,949||1,304||222||2,485||7,487|
|CC||# of tokens||train||19,675||12,780||843||14,767||312,191|
|# of entities||train||2,338||1,481||210||3,206||7,910|
Table below presents average entity lengths in tokens. The minimal reference length corresponds to four tokens. According to the entity lengths, the act references are the most complex entities while the institution references are the simplest ones.
Distributed under CC BY-NC-SA 4.0 licence.
We gratefully acknowledge support from the Technology Agency of the Czech Republic (grant no. TA02010182).