AlignFix: A Tool for Parallel Corpora Augmentation and Refinement

Samuel Frontull, Simon Haller-Seeber


Abstract
High-quality datasets are crucial for training effective state of the art machine translation systems. However, due to the data-intensive nature of these systems, they have to be trained on large amounts of text that can easily go beyond the scope of full human inspection. This makes the presence of noise that can degrade overall system performance a frequent and significant issue. While various approaches have been developed to identify and select only the highest-quality training examples, this is undesirable in scenarios where resources are limited. For this reason, we introduce AlignFix, an open-source tool for augmenting data, identifying and correcting errors in parallel corpora. Leveraging word alignments, AlignFix extracts consistent phrase pairs, enabling targeted replacements that can improve the dataset quality. Besides targeted replacements, the tool enables contextual augmentation by duplicating sentences and allowing users to substitute words with alternatives of their choice. The tool maintains and updates the underlying word alignments, thereby avoiding the costly recomputation. AlignFix runs locally in the browser, requires no installation, and ensures that all data remains entirely on the client side. It is released under Apache 2.0 license, encouraging broad adoption, reuse, and further development. A live demo is available at https://ifi-alignfix.uibk.ac.at.
Anthology ID:
2026.eacl-demo.17
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Month:
March
Year:
2026
Address:
Rabat, Marocco
Editors:
Danilo Croce, Jochen Leidner, Nafise Sadat Moosavi
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
215–224
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-demo.17/
DOI:
Bibkey:
Cite (ACL):
Samuel Frontull and Simon Haller-Seeber. 2026. AlignFix: A Tool for Parallel Corpora Augmentation and Refinement. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations), pages 215–224, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):
AlignFix: A Tool for Parallel Corpora Augmentation and Refinement (Frontull & Haller-Seeber, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-demo.17.pdf