This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
AntonioCastaldo
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Post-editing machine translation (MT) for creative texts, such as literature, requires balancing efficiency with the preservation of creativity and style. While neural MT systems struggle with these challenges, large language models (LLMs) offer improved capabilities for context-aware and creative translation. This study evaluates the feasibility of post-editing literary translations generated by LLMs. Using a custom research tool, we collaborated with professional literary translators to analyze editing time, quality, and creativity. Our results indicate that post-editing (PE) LLM-generated translations significantly reduce editing time compared to human translation while maintaining a similar level of creativity. The minimal difference in creativity between PE and MT, combined with substantial productivity gains, suggests that LLMs may effectively support literary translators.
UniOr PET is a browser-based platform for machine translation post-editing and a modern successor to the original PET tool. It features a user-friendly interface that records detailed editing actions, including time spent, additions, and deletions. Fully compatible with PET, UniOr PET introduces two advanced timers for more precise tracking of editing time and computes widely used metrics such as hTER, BLEU, and ChrF, providing comprehensive insights into translation quality and post-editing productivity. Designed with translators and researchers in mind, UniOr PET combines the strengths of its predecessor with enhanced functionality for efficient and user-friendly post-editing projects.
Large Language Models (LLMs) have demonstrated impressive performance in translating content across different languages and genres. Yet, their potential in the creative aspects of machine translation has not been fully explored. In this paper, we seek to identify the strengths and weaknesses inherent in different LLMs when applied to one of the most prominent features of creative works: the translation of idiomatic expressions. We present an overview of their performance in the EN→IT language pair, a context characterized by an evident lack of bilingual data tailored for idiomatic translation. Lastly, we investigate the impact of prompt design on the quality of machine translation, drawing on recent findings which indicate a substantial variation in the performance of LLMs depending on the prompts utilized.
Natural Language Processing (NLP) research and development has experienced rapid progression in the recent times due to advances in deep learning. The introduction of pre-trained large language models (LLMs) is at the core of this transformation, significantly enhancing the performance of machine translation (MT) and speech technologies. This development has also led to fundamental changes in modern translation and speech tools and their methodologies. However, there remain challenges when extending this progress to underrepresented dialects and low-resource languages, primarily due to the need for more data. This paper details our submissions to the IWSLT speech translation (ST) tasks. We used the Whisper model for the automatic speech recognition (ASR) component. We then used mBART and NLLB as cascaded systems for utilising their MT capabilities. Our research primarily focused on exploring various dialects of low-resource languages and harnessing existing resources from linguistically related languages. We conducted our experiments for two morphologically diverse language pairs: Irish-to-English and Maltese-to-English. We used BLEU, chrF and COMET for evaluating our MT models.
This system description paper presents SETU-ADAPT’s submission to the WMT 2024 Biomedical Shared Task, where we participated for the language pairs English-to-French and English-to-German. Our approach focused on fine-tuning Large Language Models, using in-domain and synthetic data, employing different data augmentation and data retrieval strategies. We introduce a novel MT framework, involving three autonomous agents: a Translator Agent, an Evaluator Agent and a Reviewer Agent. We present our findings and report the quality of the outputs.
This paper presents the SETU-ADAPT submissions to the WMT24 Chat Translation Task. Large language models (LLM) currently provides the state-of-the-art solutions in many natural language processing (NLP) problems including machine translation (MT). For the WMT24 Chat Translation Task we leveraged LLMs for their MT capabilities. In order to adapt the LLMs for a specific domain of interest, we explored different fine-tuning and prompting strategies. We also employed efficient data retrieval methods to curate the data used for fine-tuning. We carried out experiments for two language pairs: German-to-English and French-to-English. Our MT models were evaluated using three metrics: BLEU, chrF and COMET. In this paper we describes our experiments including training setups, results and findings.