Behzad Shayegh
2025
Feeding Two Birds or Favoring One? Adequacy–Fluency Tradeoffs in Evaluation and Meta-Evaluation of Machine Translation
Behzad Shayegh
|
Jan-Thorsten Peter
|
David Vilar
|
Tobias Domhan
|
Juraj Juraska
|
Markus Freitag
|
Lili Mou
Proceedings of the Tenth Conference on Machine Translation
We investigate the tradeoff between adequacy and fluency in machine translation. We show the severity of this tradeoff at the evaluation level and analyze where popular metrics fall within it. Essentially, current metrics generally lean toward adequacy, meaning that their scores correlate more strongly with the adequacy of translations than with fluency. More importantly, we find that this tradeoff also persists at the meta-evaluation level, and that the standard WMT meta-evaluation favors adequacy-oriented metrics over fluency-oriented ones. We show that this bias is partially attributed to the composition of the systems included in the meta-evaluation datasets. To control this bias, we propose a method that synthesizes translation systems in meta-evaluation. Our findings highlight the importance of understanding this tradeoff in meta-evaluation and its impact on metric rankings.
2024
Tree-Averaging Algorithms for Ensemble-Based Unsupervised Discontinuous Constituency Parsing
Behzad Shayegh
|
Yuqiao Wen
|
Lili Mou
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We address unsupervised discontinuous constituency parsing, where we observe a high variance in the performance of the only previous model in the literature. We propose to build an ensemble of different runs of the existing discontinuous parser by averaging the predicted trees, to stabilize and boost performance. To begin with, we provide comprehensive computational complexity analysis (in terms of P and NP-complete) for tree averaging under different setups of binarity and continuity. We then develop an efficient exact algorithm to tackle the task, which runs in a reasonable time for all samples in our experiments. Results on three datasets show our method outperforms all baselines in all metrics; we also provide in-depth analyses of our approach.
Search
Fix author
Co-authors
- Lili Mou 2
- Tobias Domhan 1
- Markus Freitag 1
- Juraj Juraska 1
- Jan-Thorsten Peter 1
- show all...