Welcome to the CoNLL Shared Task 2009 website
|News||The Task in Short||Shared Task Submissions|
|Registration||Dates and deadlines||The Organizing Team|
(2012) All data from this task (with the exception of
Japanese due to licencing problems) are now available
from the LDC,
Catalog Nos. LDC2012T03 and
They are no longer available from this site.
(May 19 2010) The outputs of all the participating systems are now available for download. See the Results page.
(June 5) After yesterday presentations and the poster session at Boulder, the results (system scores) are now open to the public.
(June 3) Program updated with location, cancellations, chair, etc.
(May 5) P-columns' syntactic accuracy fixed in results tables (thanks to Pierre Nugues).
(May 3) Program and instructions for presenters now available.
(Mar 31) Small typo in Mihai Surdeanu's name discovered and corrected in both the .doc reference file and the bibtex reference list on the Paper Submission page. Please re-insert into your paper. Thanks go to Pierre Nugues for noticing and reporting the bug!
(Mar 27) Bugfix in the scorer: the value of the Exact Semantic Match (percentage of sentences with correct semantical information) slightly changed. Download the new version if you want to use this value. This change does not affect the original scores, since the Exact Match has not been reported in the tables on the Resutls page.
(Mar 26) The bottom of the Evaluation Data page now contains instructions to download the full Gold Evaluation data.
(Mar 23) The Results page now adds the accuracy figures for the parsers supplied for the SRL-only task. They might be useful for evaluating and discussing correlation between the parsing and SRL components in your system papers.
(Mar 21) The Paper submission guidelines are now up. Please note the extended deadline (by one day) - it's now March 31, midnight HST.
(Mar 21) The results are up on the Results page.
(Mar 21) The evaluation period is over. System outputs cannot be uploaded anymore. Please start working on your system description paper...
(Mar 16) Results page up: no results yet, but you can check the status of your upload as we process it further towards evaluation.
(Mar 13) Deadline extension: system output upload by two days (now at March 20, midnight HST), paper submission (details TBA) by one day (now at March 31, midnight HST).
(Mar 13) Final version of the scorer is available. This is the scorer that will be used to evaluate your systems' output. It is the beta2 with only slightly cleaner code; results should be identical.
(Mar 11) Evaluation data available. Good luck! (Mar 1) New version (beta 2) of the scorer released. Fixes: bug in frame comparison and bug in hash dereference (thanks to Paul Bedaride).
(Feb 19) Version 'C' of Catalan data has been released. Please download it from our website's data page, using your original username and ID.
(Feb 17) Version 'D' of Chinese data released. It fixes frame files problems, non-matching lemmas vs. predicates, and more. Please follow the download instructions that will be sent to you shortly by the LDC.
(Feb 17) The scorer has been released.
(Feb 12) Version 'B' of German data released. Only filenames for frame files have changed - if your OS supports Unicode filenames, you do not have to worry. If not, or if you want to be compatible with the others, use the new German distribution package from our data page, using your original username and ID.
(Feb 11) (NOW OBSOLETE) Version 'C' of Chinese data released. It fixes some underscore problems in the training data file. Please follow the download instructions that will be sent to you shortly by the LDC.
(Feb 9) Version 'B' of the Chinese (CHINESE 'B' NOW OBSOLETE) Czech and English data is available. These are distributed by the LDC - you do not have to do anything but wait for the download instructions from them. New license holders will automatically get the new version. The differences are small (against the original 'A' version) - they merely incorporate the patches/diffs published earlier and some new changes in the Czech training and development data.
(Feb 6) (NOW OBSOLETE) Bugfix (diff file) for the Chinese training data corpus published on the "Training Data Download (CORRECTION No. 2)" page. Since it is only a few lines which have to be changed, we have decided not to make a 'B' version yet; we will do so only if more problems are discovered.
(Jan 28) (NOW OBSOLETE) Bugfix (diff file) for the English training data corpus published on the "Training Data Download (CORRECTION No. 1)" page. Since it is only ONE character which has to be changed, we have decided not to make a 'B' version yet; we will do so only if more problems are discovered.
(Jan 26) Training and Development Data Update: Spanish and Catalan version 'B' available. Please go to the "Training Data Download" page and download these two new datasets. Difference vs. version 'A': frame files updated, and the filenames do not contain non-ascii characters anymore.
(Jan 19) Training and Development Data Available
(Jan 8) Corrected Trial Data available for some languages. Please reload. Also, a "bonus" visualization of the trial data is now available from the Trial Data Download page.
(Jan 5) Trial Data available (follow the "Trial Data Download" link in the main menu on the left)
(Dec 23) Task Description posted and registration open: Please register on or before January 5, 2009.
(Dec 7) Website created
The Task: Syntactic and Semantic Dependencies in Multiple Languages
The task for CoNLL-2009 is an extension of the CoNLL-2008 shared task to multiple languages (English plus Catalan, Chinese, Czech, German, Japanese and Spanish). The core task of (jointly) extracting syntactic and semantic dependencies and the main evaluation scheme and methodology remains unchanged, with several new twists proposed to make the task interesting also for those who have already taken part in the English-only task in 2008. Among the new features are compatible evaluation for several languages and their comparison, comparison of time and space complexity based on participants' input, and learning curve comparison for languages with large datasets.
The data contents and format will be similar to the CoNLL-2008 shared task whenever possible, depending also on the source treebanks being used. The shared task data will thus have the following features:
- The syntactic and semantic dependencies will be represented directly.
- The contents of the datasets will allow for joint learning of both syntactic and semantic dependencies and their labeling.
- Tools will be provided whenever possible to help with the dependency analysis of the languages involved.
- The contents and format of the data will enable the participants to build on the previous CoNLL shared tasks on semantic labeling and dependency parsing.
- The format of the non-English datasets will be identical in form and close to identical in content to the English data. Participants will be thus required to submit results for all the languages provided.
Shared Task Submissions
Participants are required to register (see the link on the left). Watch this page also for updated information about data formats, availability of trial data etc., and most importantly, how to upload system test data results for evaluation.
Participants will have the opportunity to submit their system description and results achieved to a separate track of the CoNLL-2009 conference.
Please register if you intend to participate. We will be able to contact you then directly with news, data availability etc.Top
The Dates and Deadlines
- Registration of participants - January 5th
- Release of a small trial data set - January 5th
- Release of training and development data sets - January 19th
- Release of test data - March 11th
- Submission of test runs -
March 18thMarch 20th
- Submission of papers -
March 30thMarch 31st
- Notification of acceptance - April 8th
- Camera-ready deadline - April 15th
- CoNLL-2009 at NAACL - June 4th and 5th
- Jan Hajič (chair), Institute of Formal and Applied Linguistics, Charles University in Prague, hajic (at) ufal.mff.cuni.cz
- Massimiliano Ciaramita, Google, Inc., Zurich (Switzerland)
- Richard Johansson, Lund University (Sweden)
- Daisuke Kawahara, NICT (Japan)
- Maria Antonia Martí, University of Barcelona (Spain)
- Lluís Màrquez, Universitat Politecnica de Catalunya, Barcelona (Spain)
- Adam Meyers, New York University (USA)
- Joakim Nivre, Uppsala University (Sweden)
- Jan Štěpánek, Charles University, Prague (Czech Republic)
- Sebastian Padó, Stanford University (USA)
- Pavel Straňák, Charles University, Prague (Czech Republic)
- Mihai Surdeanu, Stanford University (USA)
- Nianwen (Bert) Xue, University of Colorado, Boulder (USA)
- Yi Zhang, Saarland University, Saarbrücken (Germany)