Saša Hasan

Also published as: Sasa Hasan

2023

pdf abs
Automating Behavioral Testing in Machine Translation
Javier Ferrando | Matthias Sperber | Hendra Setiawan | Dominic Telaar | Saša Hasan
Proceedings of the Eighth Conference on Machine Translation

Behavioral testing in NLP allows fine-grained evaluation of systems by examining their linguistic capabilities through the analysis of input-output behavior. Unfortunately, existing work on behavioral testing in Machine Translation (MT) is currently restricted to largely handcrafted tests covering a limited range of capabilities and languages. To address this limitation, we propose to use Large Language Models (LLMs) to generate a diverse set of source sentences tailored to test the behavior of MT models in a range of situations. We can then verify whether the MT model exhibits the expected behavior through matching candidate sets that are also generated using LLMs. Our approach aims to make behavioral testing of MT systems practical while requiring only minimal human effort. In our experiments, we apply our proposed evaluation framework to assess multiple available MT systems, revealing that while in general pass-rates follow the trends observable from traditional accuracy-based metrics, our method was able to uncover several important differences and potential bugs that go unnoticed when relying only on accuracy.

This work presents improvements of a large-scale Arabic to French statistical machine translation system over a period of three years. The development includes better preprocessing, more training data, additional genre-specific tuning for different domains, namely newswire text and broadcast news transcripts, and improved domain-dependent language models. Starting with an early prototype in 2005 that participated in the second CESTA evaluation, the system was further upgraded to achieve favorable BLEU scores of 44.8% for the text and 41.1% for the audio setting. These results are compared to a system based on the freely available Moses toolkit. We show significant gains both in terms of translation quality (up to +1.2% BLEU absolute) and translation speed (up to 16 times faster) for comparable configuration settings.

pdf abs
Automatic Evaluation Measures for Statistical Machine Translation System Optimization
Arne Mauser | Saša Hasan | Hermann Ney
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Evaluation of machine translation (MT) output is a challenging task. In most cases, there is no single correct translation. In the extreme case, two translations of the same input can have completely different words and sentence structure while still both being perfectly valid. Large projects and competitions for MT research raised the need for reliable and efficient evaluation of MT systems. For the funding side, the obvious motivation is to measure performance and progress of research. This often results in a specific measure or metric taken as primarily evaluation criterion. Do improvements in one measure really lead to improved MT performance? How does a gain in one evaluation metric affect other measures? This paper is going to answer these questions by a number of experiments.

pdf
Triplet Lexicon Models for Statistical Machine Translation
Saša Hasan | Juri Ganitkevitch | Hermann Ney | Jesús Andrés-Ferrer
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

2007

pdf
Are Very Large N-Best Lists Useful for SMT?
Saša Hasan | Richard Zens | Hermann Ney
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf
A Systematic Comparison of Training Criteria for Statistical Machine Translation
Richard Zens | Saša Hasan | Hermann Ney
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf abs
Creating a Large-Scale Arabic to French Statistical MachineTranslation System
Saša Hasan | Anas El Isbihani | Hermann Ney
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this work, the creation of a large-scale Arabic to French statistical machine translation system is presented. We introduce all necessary steps from corpus aquisition, preprocessing the data to training and optimizing the system and eventual evaluation. Since no corpora existed previously, we collected large amounts of data from the web. Arabic word segmentation was crucial to reduce the overall number of unknown words. We describe the phrase-based SMT system used for training and generation of the translation hypotheses. Results on the second CESTA evaluation campaign are reported. The setting was inthe medical domain. The prototype reaches a favorable BLEU score of40.8%.

pdf
The RWTH statistical machine translation system for the IWSLT 2006 evaluation
Arne Mauser | Richard Zens | Evgeny Matusov | Sasa Hasan | Hermann Ney
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign

pdf
Reranking Translation Hypotheses Using Structural Properties
Saša Hasan | Oliver Bender | Hermann Ney
Proceedings of the Workshop on Learning Structured Information in Natural Language Applications

pdf
A Flexible Architecture for CAT Applications
Saša Hasan | Shahram Khadivi | Richard Zens | Hermann Ney
Proceedings of the 11th Annual Conference of the European Association for Machine Translation

2005

pdf abs
Statistical Machine Translation of European Parliamentary Speeches
David Vilar | Evgeny Matusov | Sasa Hasan | Richard Zens | Hermann Ney
Proceedings of Machine Translation Summit X: Papers

In this paper we present the ongoing work at RWTH Aachen University for building a speech-to-speech translation system within the TC-Star project. The corpus we work on consists of parliamentary speeches held in the European Plenary Sessions. To our knowledge, this is the first project that focuses on speech-to-speech translation applied to a real-life task. We describe the statistical approach used in the development of our system and analyze its performance under different conditions: dealing with syntactically correct input, dealing with the exact transcription of speech and dealing with the (noisy) output of an automatic speech recognition system. Experimental results show that our system is able to perform adequately in each of these conditions.