2017/04/01 Our proposal for DARIAH CZ infrastructure has passed 1st round of evaluation succesfully. We are now working with our 9 partner institutions on a full proposal for the 2nd round.
2017/04/01 Two new NAKI II projects have started, one of which I coordinate (VIADAT). We also particiapte on another project coordinated by our colleagues at the University of West Bohemia, with the Institute for the Study of totalitarian Regimes.
2016/03/01 I have started my Adjunct Professor position at the University of Colorado in Boulder, Department of Computer Science, working with Martha Palmer and others on Computational Linguistics, NLP and new multilingual language resources.
2016/01/01 Second phase of the LINDAT/CLARIN Research Infrastructure has started, of which I mamthe PI. It has been reduced by the government decision of 2015/12/21 in budget and time, but it will allow us to continue our work on language resources and service.
2015/04/28 At the Riga Summit on the European Multilingual Digital Single Market, I was elected the chair of the META-NET Executive Board.
2015/04/20 I will take part in the new CLARIN PLUS eInfra project, just announced (Czech / CUNI PI: Pavel Stranak, UFAL).
2014/09/29 The first ICT H2020 Call results have been announced. Barring any critical problems, I will be the Charles University in Prague PI in CRACKER CSA, HimL Innovation Action and QT21 Research and Innovation Action projects, and will take part in KConnect IA project (CUNI's PI: Pavel Pecina).
My research interest evolved from morphology and tagging of inflective languages (lexicons, analysis and generation tools - see this demo) to machine translation (French-English while at IBM and Czech-English; also, Czech-Russian and other closely related languages). I am also interested in parsing (see e.g. the CLSP Workshop on parsing Czech) and generation. However, in the past 10 years, I devoted most of my research time to creating linguistic resources, such as the Prague Dependency Treebank family of projects (Czech, English, Arabic) and managing new research projects, mainly funded by the EU (see below for a complete list).
I am also interested in spoken language understanding. I participated in the now finishing project Malach, both on the language modeling part (for ASR), on thesaurus translation and on the IR Czech test collection.
I closely work not only with my students, but also with other Czech and foreign teams, such as the University or West Bohemia in the Czech Republic, Center for Speech and Language Processing at the JHU, Center for Spoken Language Research at CU-Boulder, Linguistic Data Consortium, and several European Universities on EU projects (see below).
I am or have been the PI, or the national PI of several major Czech, EU and NSF (US) research projects. The list of current (or of very recently finished) projects is below.
|2010-2015, extended to 2019||LINDAT/CLARIN, Large infrastructural grant for language resources, data access and distribution and related reseearch, project LM2010013 and since 2016 as LM2015071 of the Ministry of Education of the Czech Republic|
|2016-2019||VIADAT, Virtual Assistent for Access to Oral History Archives, with the Institute of Contemporary Hisotory of the Academy of Sciences of the Czech Republic and the National Film Archive of the Czech Republic (PI)|
|2015-2017||CRACKER, Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research. H2020 CSA, PI of the Czech partner, Charles University in Prague. Under negotiation. Coordinated by Hans Uszkoreit, DFKI Berlin, Germany.|
|2015-2018||HimL, Health in my Language. H2020 Innovation Action. PI of the Czech partner, Charles University in Prague. Under negotiation. Coordinated by Barry Haddow, University of Edinburgh, Scotland.|
|2015-2018||QT21, Quality Translation 21. H2020 Research and Innovatio Action. PI of the Czech partner, Charles University in Prague. Under negotiation. Coordinated by Barry Haddow, University of Edinburgh, Scotland.|
|2013-2016||QTLeap, Quality Translation by Deep Language Engineering Approaches. FP7 STREP project. PI of the Czech partner, Charles University in Prague. Coordinated by Antonio Branco, FCUL, Lisabon, Portugal.|
|2011-2015||AMALACH, Access to multilingual archives, with ZCU in Pilsen and USC, Los Angeles, USA, a Czech Ministry of Culture applied project. PI of the grant.|
|2011-2014||EUDAT, European Data Infrastructure, Large infrastructural EU project. PI of the Czech partner, Charles University in Prague.|
|2010-2014||Khresmoi, IP of the 7th FP of the EU (Coordinator: Hennig Müller, HES-SO, Switzerland)|
|2010-2013||META-NET, Network of Excellence of the 7th FP of the EU - Building the Multilingual Europe Technology Alliance (coordinated by DFKI, Berlin, Prof. Hans Uszkoreit)|
|2010-2013||Faust, STREP of the 7th FP of the EU - Feedback analysis for improved Statistical Machine Translation (Coordinator: William Byrne, University of Cambridge)|
|2009-2012||EuroMatrixPlus, STREP of the 7th FP of the EU (Coordinator: Hans Uszkoreit, Univ. of Saarland, Germany)|
|2006-2010||Companions, IP of the 6th FP of the EU - Conversational Dialogue system (Coordinator: Yorick Wilks, Univ. of Sheffield, GB)|
|2005-2011||Center for Computational Linguistics, a virtual Center for joint research with the University of West Bohemia, Masaryk University of Brno, and the Institute of the Czech Language in Prague)|
|2006-2010||PIRE, a project funded by the NSF to promote U.S. graduate student education in Europe. Topic: Investigation of Meaning Representations in Language Understanding for Speech Reconstruction and Machine Translation Systems.|
|2002-2007||Malach, project for automatic speech recognition (in many languages) of taped interviews with Holocaust survivors, collected by the Shoah Visual History Foundation. Also, Information Retrieval experiments and resource creation.|
Before that, I have been the PI or Co-PI of many other projects, such as the Czech Grant-Agency supported highly collaborative, nation-wide Czech National Corpus project (2003-2006), of several collaborative grants for mutual visits to/from U.S. institutions (Johns Hopkins University, University of Pennsylvania, Univ. of Colorado), and of several smaller subcontracting grants (such as the U.S.-based GALE project). In the 90s, I have been the Czech PI of several collaborative EU projects specifically aimed at the formerly Soviet Bloc Countries (EU project STEEL, EU project CEGLEX).
I have been working on some other grants as a researcher as well, such as the predecessor Center for Computational Linguistics (2000-2004), the Laboratory for Linguistic Data (1996-2000), Czech-English MT project supported by the Czech Grant Agency MATRACE (1993-1995), and many smaller projects.
Several industrial projects have got my attention as well, such as the Czech Grammar Checker project and certain lexicon(s) for Microsoft, morphological databases for companies like IBM, Xerox, Lotus, Morphologic, Zi Corp., Lernout & Hauspie, and cooperation on product development for several Czech companies, such as ASPI (legal information system using NL search), Oracle (the Oracle Context product) and morphological dictionary development for the Czech and Slovak portal centrum.cz and centrum.sk.
Back to top.
|2003-||Institute of Formal and Applied Linguistics, School of Computer Science, Faculty of Mathematics and Physics, Charles University in Prague. Vice-director (2012-). LINDAT/CLARIN infrastructural project director/coordinator (2010-). Director (2003-2011). Acting director (2001-2003, 2011-2012).|
|2016-||Department of Computer Science, University of Colorado in Boulder. Adjunct Professor.|
|2008-||Full Professor of the Charles University in Prague|
|2003-2007||Associate Professor of the Charles University in Prague|
|2002||Team Leader, CLSP JHU Summer Workshop, Generation in the Context of Machine Translation|
|1999-2000||Visiting Assistant Professor, Computer Science Dept. and Center for Speech and Language Processing, Johns Hopkins University, Baltimore, MD, USA. Teaching "Introduction to NLP" (two semesters) and "Data Structures"|
|1998||Team Leader, CLSP JHU Summer Workshop, Core Natural Language Processing Technology Applicable to Multiple Languages|
|1994||PhD ("Dr.") in Computational Linguistics, Faculty of Mathematics and Physics, Charles University in Prague. Topic: Computational Morphology of Czech.|
|1993-2003||Researcher, Assistant Professor, Institute of Formal and Applied Linguistics, School of Computer Science, Faculty of Mathematics and Physics,Charles University in Prague.|
|1991-1993||Visiting Scientist, IBM T.J.Watson Research Center, Yorktown Heights, NY, USA. Project: Candide (Statistical Machine Translation French -> English, project head(s): Robert Mercer, Peter Brown)|
|1990,1991||Visiting Scientist, ISSCO, Univ. of Geneva, Switzerland. Project: Multilingual Morphological Analysis.|
|1984-1991||Researcher, Research Institute of Mathematical Machines, Prague. Project: Machine Translation Czech -> Russian (software documentation).|
|1979-1984||Bc. & Master Degree study, Faculty of Mathematics and Physics, Charles University in Prague (high honors, RNDr. 1984, thesis topic: Natural Language Robot Control).|
Back to top.
Back to top.
I am now teaching an adapted version of the "Introduction to (statistical) NLP" course which I developed while at JHU. The current course is divided into two parts:NPFL067 and NPFL068. Please see also my Hopkins' archive web pages for more information and the complete set of foils in html form.
My other current and former teaching at Charles University in Prague can be found here; and this is a direct link to my current courses in the Charles University information system.
This section is new (in 2012); it contains talks starting in 2012, but it will be extended backwards at some point. Also there are certainly some talks missing... apologies.
|2012, Dec. 9||Second Workshop on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT (ML4HMT-12), Mumbai, India||Deep Linguistic Information in Hybrid Machine Translation|
|2012, Nov. 21||Preserving Survivors' Memories, Berlin, Germany||Language Technology Research: Serving eHumanities (New Ways of Accessing the USC Shoah Foundation Archive at the Center for Visual History Malach)|
General Conference Chair
|2010||ACL'10, Uppsala, Sweden|
Program Committee Chair, Co-chair
|2014||Coling 2014, Dublin, Ireland; co-chair with Jun-ichi Tsujii.|
|2012||META-RESEARCH Workshop on Advanced Treebanking, LREC 2012, Istanbul, Turkey (with Koenraad deSmedt, Antonio Branco and Marko Tadic).|
|2007||TLT'07 (Treebanks and Linguistic Theories), Bergen, Norway|
|2006||TLT'06 (Treebanks and Linguistic Theories), Prague, Czech Rep.|
|2003||EACL'03 (European ACL Conference), Budapest, Hungary|
|2002||EMNLP'02 (Empirical Methods in NLP), Philadelphia, PA, USA|
|1999||Thematic Session on "Parsing inflective and free word order languages" ACL '99, June 1999, College Park, MD, USA|
Program Committee Area Chair, Full PC Member
|2004||EMNLP'04, Barcelona, Spain|
|2004||EAMT Workshop, La Valetta, Malta|
|2002||ACL'02, Philadelphia, PA, USA|
|1995||EACL'95, Dublin, Ireland|
|2003-||Text, Speech and Dialog Conference, Czech Rep., (standing) PC (SC) Member|
I have also served as a reviewer at additional 38 conferences or workshops (between 1994 and 2013).
Organization of conferences and workshops
|2014||JHU Summer Workshop for Speech and Language Processing, July 2014, Prague, Czech Rep. (in cooperation with Johns Hopkins Univ., Baltimore, MD, USA)|
|2012||META-RESEARCH Workshop on Advanced Treebanking, LREC 2012, Istanbul, Turkey.|
|2007||ACL'07 and EMNLP'07, Prague, Czech Republic (Local Coordinator)|
|2006||TLT'06, Prague, Czech Republic|
|2006-2010||Vilem Matheisus Courses (Schools), Prague, Czech Republic|
|2015-||Executive Board of META-NET, chair.|
|2012-2015||Member of the joint Clarin DE / Dariah DE Technical Advisory Board (Germany).|
|2012-2014||Member of the International Advisory Board, Clarin NL (Netherlands).|
|2012-||Member of the International Committee on Computational Linguistics (ICCL).|
|2013-2017||Member of the Management Committee (for Czech Republic) for the COST IC1207 Action of the ESF, within the 7th FP EU (PARSEME, IC1207).|
|2012-2016||Member of the Standing Committee for CLARIN Technical Centres (SCCTC), of the EU-wide language resource infrastructure Clarin ERIC.(1st and 2nd term)|
|2012-2016||Member of the Scientific Council of the Faculty of Mathematics and Physics, Charles University in Prague (2nd term)|
|2012-2016||Member of the Council of the core research PRVOUK project, awarded to the Computer Science School by the Charles University in Prague.|
|2011-2019||Research Council of the Technology Agency of the Czech Republic, member (now in 2nd term)|
|2011-2012||Expert panel of the Coordinating Committee on the strategy of applied research in the Czech Republic ("Priorities 2030") of the Council for Science, Research and Innovations of the the Czech Republic|
|2011-||Steering Committee for the establishment of the Transactions of the Association for Computational Linguistics journal.|
|2011-2012||Scientific Council of the Faculty of Mathematics and Physics, Charles University in Prague (1st term)|
|2010-2014||Subcommittee for social sciences and humanities, Council for Science, Research and Innovations of the government of the Czech Republic|
|2008-2010||Computational Linguistics, Editorial Board Member|
|2003-||NSF Panels (ITR, HLT)|
|2002-||ACL SIGDAT Advisory Board member|
|1999-2002||TEI Consortium Board of Directors Member, ACL Representative|
|1998-1999||TEI Steering Committee Member, ACL Representative|
|1997-||EU Evaluation Committee(s), Research Projects|
|1996-||Grant Agency of the Czech Republic, reviewer (Linguistic and Computer Science Programs)|
|1995-1996||European Chapter of the ACL Advisory Board Member|
|1990-||Czech National Corpus Founding Member|
|2012||Silver Medal I of the Faculty of Mathematics and Physics, Charles University in Prague.|
|2009||Award of the Academy of Sciences of the Czech Republic for the best research project in the programme "Information Society" 2005-2009 (Project: "From natural language to the semantic web")|
|2005||Co-author of a best student paper at EMNLP 2005, Vancouver, with Ryan McDonald, Fernando Pereira and Kiril Ribarov: "Non-projective Dependency Parsing using Spanning Tree Algorithms"|
|2001||Silver Medal of the Charles University in Prague (for the Czech National Corpus)|
I am member of the ACL, ISCA, ACM, IEEE, Czech Cybernetics Society and the Prague Linguistic Circle
You might want to visit my previous page(s) and teaching pages at http://www.cs.jhu.edu/~hajic.
You might also want to visit our Institute's pages at http://ufal.mff.cuni.cz