This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
FelixGaschi
Also published as:
Félix Gaschi
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
With Large Language Models (LLMs) becoming increasingly multilingual, effective knowledge editing (KE) needs to propagate edits across languages. Evaluation of the existing methods for cross-lingual knowledge editing (CKE) is limited both w.r.t. edit effectiveness: benchmarks do not account for entity aliases and use faulty entity translations; as well as robustness: existing work fails to report on downstream generation and task-solving abilities of LLMs after editing. In this work, we aim to (i) maximize the effectiveness of CKE while at the same time (ii) minimizing the extent of downstream model collapse due to the edits. To accurately measure the effectiveness of CKE methods, we introduce BabelEdits, a new CKE benchmark covering 60 languages that combines high-quality multilingual synsets from BabelNet with marker-based translation to ensure entity translation quality. Unlike existing CKE benchmarks, BabelEdits accounts for the rich variety of entity aliases within and across languages. We then propose BabelReFT, a modular CKE approach based on representation fine-tuning (ReFT) which learns entity-scope ReFT modules, applying them to all multilingual aliases at inference. Our experimental results show that not only is BabelReFT more effective in CKE than state-of-the-art methods, but, owing to its modular design, much more robust against downstream model collapse when subjected to many sequential edits.
La structuration automatique de posologie est essentielle pour fiabiliser la médication et permettre une assistance à la prescription médicale. Les textes de prescriptions en français présentent très souvent des ambiguïtés, des variabilités syntaxiques, et des expressions colloquiales, ce qui limite l’efficacité des approches classiques de machine learning. Nous étudions ici l’emploi de Grands Modèles de Langages (LLM) pour structurer les textes de posologie en comparant des méthodes fondées sur le prompt-engineering et le fine-tuning de LLM avec un système “pré-LLM” fondé sur un algorithme de reconnaissance et liaison d’entités nommées (NERL). Nos résultats montrent que seuls les LLM fine-tunés atteignent la précision du modèle de référence. L’analyse des erreurs révèle une complémentarité des deux approches : notre NERL permet une structuration plus précise, mais les LLMs captent plus efficacement les nuances sémantiques. Ainsi, nous proposons le modèle hybride suivant : faire appel à un LLM en cas de faible confiance en la sortie du NERL (<0.8) selon notre propre score de confiance. Cette stratégie nous permet d’atteindre une précision de 91% tout en minimisant le temps de latence. Nos résultats suggèrent que cette approche hybride améliore la précision de la structuration de posologie tout en limitant le coût computationnel, ce qui en fait une solution scalable pour une application clinique en conditions réelles.
Realignment techniques are often employed to enhance cross-lingual transfer in multilingual language models, still, they can sometimes degrade performance in languages that differ significantly from the fine-tuned source language. This paper introduces AlignFreeze, a method that freezes either the layers’ lower half or upper half during realignment. Through controlled experiments on 4 tasks, 3 models, and in 35 languages, we find that realignment affects all the layers but can be the most detrimental to the lower ones. Freezing the lower layers can prevent performance degradation. Particularly, AlignFreeze improves Part-of-Speech (PoS) tagging performances in languages where full realignment fails: with XLM-R, it provides improvements of more than one standard deviation in accuracy in seven more languages than full realignment.
Natural language tasks like Named Entity Recognition (NER) in the clinical domain on non-English texts can be very time-consuming and expensive due to the lack of annotated data. Cross-lingual transfer (CLT) is a way to circumvent this issue thanks to the ability of multilingual large language models to be fine-tuned on a specific task in one language and to provide high accuracy for the same task in another language. However, other methods leveraging translation models can be used to perform NER without annotated data in the target language, by either translating the training set or test set. This paper compares cross-lingual transfer with these two alternative methods, to perform clinical NER in French and in German without any training data in those languages. To this end, we release MedNERF a medical NER test set extracted from French drug prescriptions and annotated with the same guidelines as an English dataset. Through extensive experiments on this dataset and on a German medical dataset (Frei and Kramer, 2021), we show that translation-based methods can achieve similar performance to CLT but require more care in their design. And while they can take advantage of monolingual clinical language models, those do not guarantee better results than large general-purpose multilingual models, whether with cross-lingual transfer or translation.
Without any explicit cross-lingual training data, multilingual language models can achieve cross-lingual transfer. One common way to improve this transfer is to perform realignment steps before fine-tuning, i.e., to train the model to build similar representations for pairs of words from translated sentences. But such realignment methods were found to not always improve results across languages and tasks, which raises the question of whether aligned representations are truly beneficial for cross-lingual transfer. We provide evidence that alignment is actually significantly correlated with cross-lingual transfer across languages, models and random seeds. We show that fine-tuning can have a significant impact on alignment, depending mainly on the downstream task and the model. Finally, we show that realignment can, in some instances, improve cross-lingual transfer, and we identify conditions in which realignment methods provide significant improvements. Namely, we find that realignment works better on tasks for which alignment is correlated with cross-lingual transfer when generalizing to a distant language and with smaller models, as well as when using a bilingual dictionary rather than FastAlign to extract realignment pairs. For example, for POS-tagging, between English and Arabic, realignment can bring a +15.8 accuracy improvement on distilmBERT, even outperforming XLM-R Large by 1.7. We thus advocate for further research on realignment methods for smaller multilingual models as an alternative to scaling.