This version of the corpus is outdated. Please use Czech Named Entity Corpus 1.1.
The aim of Named Entity Recognition (NER) is to identify proper names in text and to classify them into predefined categories such as names of persons, geographical names, names of organizations etc. The task of NER is motivated by the needs of Natural Language Processing (NLP) applications such as information extraction and machine translation. Similarly to most other tasks in NLP, it is advantageous to use annotated data when developing a named entity recognizer, especially for training and evaluation purposes. The presented Czech Named Entity Corpus 1.0 is the first publicly available corpus providing a large body of manually annotated named entities in Czech sentences, including a fine-grained classification.
Named entities are saved in formats:
Czech Named Entity Corpus 1.0 can be downloaded from LINDAT/CLARIN repository.