Who we are

Institute of Formal and Applied Linguistics (ÚFAL) is one of the specialized departments at the Faculty of Mathematics and Physics, Charles University in Prague. We are a group of research scientists, postdocs, programmers, teachers and students working on a broad variety of topics connected to the dynamic field of computational linguistics.

Having a long tradition going back to 60's behind, we use our experience to reach the best results in the field of natural language processing (NLP) worldwide, as evidenced by the number of international publications as well as the books we publish. Not only are we concerned with models of language and linguistic theories, but we also work on many projects and applications put into practice by both state and private companies.

Apart from research activities, we also carry a comprehensive teaching program both for the Master's degree in both in Czech and English (Mgr., or MSc.) as well as for a doctorate (Ph.D.) in Computational Linguistics. As for the Bachelor's degree, students can choose the profile Mathematical Linguistics within the field of General Informatics. The Institute is also a member of the double-degree Master's LCT programme of the EU.

What we do

The history of ÚFAL is tightly related to the development of the Functional generative description, an influential linguistic framework invented by Petr Sgall et al. The theory treats the sentence as a system of interlinked layers: phonological, morphematical, morphonological, analytical (surface syntax) and tectogrammatical (deep syntax). On the basis of this well-elaborated assumption, the team of ÚFAL has built the whole family of dependency treebanks usable not only in the widely-solved task of the machine translation.

The oldest and biggest treebank is the Prague Dependency Treebank, having a large number of users all around the world. The latest version 2.5 has been adapted for the current computational linguistics research needs. The corpus itself uses the latest annotation technology and the software tools for corpus search, annotation and language analysis are included as its components. Below you can see the scheme of a typical PDT sentence:

 

On a similar principle, we have built treebanks for other languages such as English or Arabic. Also, we provide PDT expansions focused on different language layers (e.g. discourse or sentiment).

We develop a range of tools for NLP, most notably Treex, a complete, modular NLP toolkit for Czech and several other languages. Try it online!

You can also try our smart spell-checker and diacritics restorer/remover.

Machine translation

Machine translation (MT) is a hot topic of research at our department. ÚFAL regularly participates in competitions in MT at the Workshop on Statistical Machine Translation. According to the latest evaluation, our Chimera is currently the most advanced MT system for English→Czech in the world, beating even Google Translate.

Chimera is a combination of two fundamentally different approaches: a statistical system Moses, and our own linguistically-oriented system TectoMT, which builds on the strong linguistic theory of Functional generative description and combines it with state-of-the art machine learning methods.

 

Try an online demo of MT for the project Khresmoi (MT system specialized for translating search queries in the medical domain).

Statistical dialogue systems

Spoken dialogue systems are a combination of very complex tasks in NLP – they require high-quality speech recognition of user’s input, advanced modelling of semantics and dialogue status and speech synthesis of the output. We have an active research group within the project Vystadial. The goal of the project is to study and improve statistical methods for learning of statistical models used in complex dialogue systems. Thanks to the Vystadial team, you can find your transport connection within Prague! Call our dialogue system ALEX for free (in Czech): 800 899 998.

We were also involved in the project Companions. In this project, we created an avatar for human-computer interaction called Petra. To chat with Petra, add user czech.companion@gmail.com to your Gmail (in Czech).

Malach Centre for visual history

The Malach Centre for visual history provides local access to the extensive digital archives of the USC Shoah Foundation which contain over 50.000 witness testimonies covering the history of entire 20th century. ÚFAL has participated in developing tools for linguistic processing of the Czech data.

Games

We have developed a number of games with a purpose which you can try online.

Events

ÚFAL has organized a number of conferences, such as the Annual Meeting of the Association of Computational Linguistics (ACL) in 2007, Depling in 2013 or Machine Translation Marathons in 2009 and 2013.

Fred Jelinek seminar series, a loose series of lectures organized in recognition of the late professor Frederick Jelinek, regularly features the most prominent researchers in the field. Video recordings of these lectures as well as regular Monday seminars are available online.

... and many more!

Apart from all the above mentioned projects, we also work on different NLP tasks including automatic speech recognition, information retrieval, machine learning, neurolinguistics, opinion mining, language teaching applications and many more. To find out, join our team in the beautiful historical centre of Prague!

Why study at ÚFAL

  • ÚFAL is one of the top-level internationally recognized departments concerned with the modern and widely-applicable domain of computational linguistics.

  • Not only do we provide many interesting courses to familiarize the students with the field from the very beginnings to the exciting details, but we also offer the possibility to participate on many grants and both Czech and international projects.

  • Our staff and students have many opportunities to travel abroad, either for conferences, workshops and summer schools, or for educational exchanges or research fellowships e.g. at Johns Hopkins University, Baltimore (USA), University of Saarland, Saarbrücken (D) and many other top-ranking institutions all over the world.

  • We have at our disposal up-to-date computer technology for the most demanding computations.

  • Our graduates find employment in leading companies in the field, as well as in any broader domain of informatics.

Some of our alumni

Jan Cuřín - now at IBM, Prague

Martin Čmejrek - now at IBM, New York

Jiří Havelka - now at IBM, Prague

Magda Hnátková - now at Arriba, San Francisco

Pavel Krbec - now at CET21, Prague

Pavel Květoň - now at IBM, Prague

Martin Majliš - now at Amazon, Toronto

Petr Pajas - now at Google, Zürich

Petr Podveský - now at RWE, Prague

Jan Rouš - now at Google, Mountain View

Jiří Semecký - now at Google, Zürich

Otakar Smrž - now at Seznam, Prague

Jan Štěpánek - now at Barclays, Prague