[ Skip to the content ]

Institute of Formal and Applied Linguistics

at Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic

[ Back to the navigation ]


Year 2016
Type article
Status published
Language English
Author(s) Attia, Mohammed Pecina, Pavel Samih, Younes Shaalan, Khaled Genabith, Josef
Title Arabic Spelling Error Detection and Correction
Czech title Detekce a korekce chyb v Arabštině
Journal Natural Language Engineering
Volume 22
Number 5
Pages range 751-773
Supported by 2012-2018 GBP103/12/G084 (Centrum pro multi-modální interpretaci dat velkého rozsahu) 2012-2016 PRVOUK P46 (Informatika)
Czech abstract Práce se zabývá automatickou kontrolou pravopisu arabštiny a ukazuje, jak vylepšení jednotlivých komponent (slovníku, jazykového modelu, chybového modelu) vede ke kumulativnímu zlepšení celého systému.
English abstract A spelling error detection and correction application is typically based on three main components: a dictionary (or reference word list), an error model and a language model. While most of the attention in the literature has been directed to the language model, we show how improvements in any of the three components can lead to significant cumulative improvements in the overall performance of the system. We develop our dictionary of 9.2 million fully-inflected Arabic words (types) from a morphological transducer and a large corpus, validated and manually revised. We improve the error model by analyzing error types and creating an edit distance re-ranker. We also improve the language model by analyzing the level of noise in different data sources and selecting an optimal subset to train the system on. Testing and evaluation experiments show that our system significantly outperforms Microsoft Word 2013, OpenOffice Ayaspell 3.4 and Google Docs.
Specialization linguistics ("jazykověda")
Confidentiality default – not confidential
Open access no
ISSN* 1351-3249
Publisher* Cambridge University Press
Creator: Common Account
Created: 10/6/14 8:11 AM
Modifier: Almighty Admin
Modified: 2/25/17 10:07 PM

Content, Design & Functionality: ÚFAL, 2006–2016. Page generated: Sun Jun 24 20:47:11 CEST 2018

[ Back to the navigation ] [ Back to the content ]

100% OpenAIRE compliant