Jon Dehdari


The Effect of Error Rate in Artificially Generated Data for Automatic Preposition and Determiner Correction
Fraser Bowen | Jon Dehdari | Josef van Genabith
Proceedings of the 3rd Workshop on Noisy User-generated Text

In this research we investigate the impact of mismatches in the density and type of error between training and test data on a neural system correcting preposition and determiner errors. We use synthetically produced training data to control error density and type, and “real” error data for testing. Our results show it is possible to combine error types, although prepositions and determiners behave differently in terms of how much error should be artificially introduced into the training data in order to get the best results.

Massively Multilingual Neural Grapheme-to-Phoneme Conversion
Ben Peters | Jon Dehdari | Josef van Genabith
Proceedings of the First Workshop on Building Linguistically Generalizable NLP Systems

Grapheme-to-phoneme conversion (g2p) is necessary for text-to-speech and automatic speech recognition systems. Most g2p systems are monolingual: they require language-specific data or handcrafting of rules. Such systems are difficult to extend to low resource languages, for which data and handcrafted rules are not available. As an alternative, we present a neural sequence-to-sequence approach to g2p which is trained on spelling–pronunciation pairs in hundreds of languages. The system shares a single encoder and decoder across all languages, allowing it to utilize the intrinsic similarities between different writing systems. We show an 11% improvement in phoneme error rate over an approach based on adapting high-resource monolingual g2p models to low-resource languages. Our model is also much more compact relative to previous approaches.

pdf bib
Common Round: Application of Language Technologies to Large-Scale Web Debates
Hans Uszkoreit | Aleksandra Gabryszak | Leonhard Hennig | Jörg Steffen | Renlong Ai | Stephan Busemann | Jon Dehdari | Josef van Genabith | Georg Heigold | Nils Rethmeier | Raphael Rubino | Sven Schmeier | Philippe Thomas | He Wang | Feiyu Xu
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Web debates play an important role in enabling broad participation of constituencies in social, political and economic decision-taking. However, it is challenging to organize, structure, and navigate a vast number of diverse argumentations and comments collected from many participants over a long time period. In this paper we demonstrate Common Round, a next generation platform for large-scale web debates, which provides functions for eliciting the semantic content and structures from the contributions of participants. In particular, Common Round applies language technologies for the extraction of semantic essence from textual input, aggregation of the formulated opinions and arguments. The platform also provides a cross-lingual access to debates using machine translation.


BIRA: Improved Predictive Exchange Word Clustering
Jon Dehdari | Liling Tan | Josef van Genabith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Using Related Languages to Enhance Statistical Language Models
Anna Currey | Alina Karakanta | Jon Dehdari
Proceedings of the NAACL Student Research Workshop

Scaling Up Word Clustering
Jon Dehdari | Liling Tan | Josef van Genabith
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations


An Awkward Disparity between BLEU / RIBES Scores and Human Judgements in Machine Translation
Liling Tan | Jon Dehdari | Josef van Genabith
Proceedings of the 2nd Workshop on Asian Translation (WAT2015)


pdf bib
Morphological Features for Parsing Morphologically-rich Languages: A Case of Arabic
Jon Dehdari | Lamia Tounsi | Josef van Genabith
Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages


Refining the most frequent sense baseline
Judita Preiss | Jon Dehdari | Josh King | Dennis Mehay
Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions (SEW-2009)