This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
FedericoGarcea
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Increasing attention is being dedicated by the NLP community to gender-fair practices, including emerging forms of non-binary language. Given the shift to the prompting paradigm for multiple tasks, direct comparisons between prompted and fine-tuned models in this context are lacking. We aim to fill this gap by comparing prompt engineering and fine-tuning techniques for gender-fair rewriting in Italian. We do so by framing a rewriting task where Italian gender-marked translations from English gender-ambiguous sentences are adapted into a gender-neutral alternative using direct non-binary language. We augment existing datasets with gender-neutral translations and conduct experiments to determine the best architecture and approach to complete such task, by fine-tuning and prompting seq2seq encoder-decoder and autoregressive decoder-only models. We show that smaller seq2seq models can reach good performance when fine-tuned, even with relatively little data; when it comes to prompts, including task demonstrations is crucial, and we find that chat-tuned models reach the best results in a few-shot setting. We achieve promising results, especially in contexts of limited data and resources.
We approach the task of assessing the suitability of a source text for translation by transferring the knowledge from established MT evaluation metrics to a model able to predict MT quality a priori from the source text alone. To open the door to experiments in this regard, we depart from reference English-German parallel corpora to build a corpus of 14,253 source text-quality score tuples. The tuples include four state-of-the-art metrics: cushLEPOR, BERTScore, COMET, and TransQuest. With this new resource at hand, we fine-tune XLM-RoBERTa, both in a single-task and a multi-task setting, to predict these evaluation scores from the source text alone. Results for this methodology are promising, with the single-task model able to approximate well-established MT evaluation and quality estimation metrics - without looking at the actual machine translations - achieving low RMSE values in the [0.1-0.2] range and Pearson correlation scores up to 0.688.
In the domain of cuisine, both dishes and ingredients tend to be heavily rooted in the local context they belong to. As a result, the associated terms are often realia tied to specific cultures and languages. This causes difficulties for non-speakers of the local language and ma- chine translation (MT) systems alike, as it implies a lack of the concept and/or of a plausible translation. MT typically opts for one of two alternatives: keeping the source language terms untranslated or relying on a hyperonym/near-synonym in the target language, provided one exists. !Translate proposes a better alternative: explaining. Given a cuisine entry such as a restaurant menu item, we identify culture-specific terms and enrich the output of the MT system with automatically retrieved definitions of the non-translatable terms in the target language, making the translation more actionable for the final user.