[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

[ Back to the navigation ]


Year 2016
Type in proceedings
Status published
Language English
Author(s) Thanh, Le Vu Throng, Hoa Oberländer, Jonathan Bojar, Ondřej
Title Using Term Position Similarity and Language Modeling for Bilingual Document Alignment
Czech title Využití podobnosti pozic termínů a jazykového modelování pro identifikaci dvojic dokumentů
Proceedings 2016: Stroudsburg, PA, USA: WMT 2016 (ACL): Proceedings of the First Conference on Machine Translation (WMT). Volume 2: Shared Task Papers
Pages range 710-716
How published online
URL http://www.statmt.org/wmt16/pdf/W16-2371.pdf
Supported by 2015-2018 H2020-ICT-2014-1-644402 (Himl (Health in my Language)) 2012-2016 PRVOUK P46 (Informatika)
Czech abstract Článek popisuje metodu hledání dvojic dokumentů, které jsou si překladem, na základě podobného umístění klíčových slov v kandidátech a na základě podobnosti měřené n-gramovým jazykovým modelem.
English abstract The WMT Bilingual Document Alignment Task requires systems to assign source pages to their “translations”, in a big space of possible pairs. We present four methods: The first one uses the term position similarity between candidate document pairs. The second method requires automatically translated versions of the target text, and matches them with the candidates. The third and fourth methods try to overcome some of the challenges presented by the nature of the corpus, by considering the string similarity of source URL and candidate URL, and combining the first two approaches.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
DOI http://dx.doi.org/10.18653/v1/w16-2371
Editor(s)* Ondřej Bojar
ISBN* 978-1-945626-10-4
Address* Stroudsburg, PA, USA
Month* August
Venue* Humboldt University
Publisher* Association for Computational Linguistics
Institution* Association for Computational Linguistics
Creator: Common Account
Created: 9/7/16 9:19 AM
Modifier: Almighty Admin
Modified: 2/25/17 10:07 PM

Content, Design & Functionality: ÚFAL, 2006–2018. Page generated: Wed Jan 16 03:56:11 CET 2019

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant