This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
YannickToussaint
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Biomedical event extraction can be divided into three main subtasks; (1) biomedical event trigger detection, (2) biomedical argument identification and (3) event construction. This work focuses in the two first subtasks. For the first subtask we analyze a set of transformer language models that are commonly used in the biomedical domain to evaluate and compare their capacity for event trigger detection. We fine-tune the models using seven manually annotated corpora to assess their performance in different biomedical subdomains. SciBERT emerged as the highest performing model, presenting a slight improvement compared to baseline models. Then, for the second subtask we construct a knowledge graph (KG) from the biomedical corpora and integrate its KG embeddings to SciBERT to enrich its semantic information. We demonstrate that adding the KG embeddings to the model improves the argument identification performance by around 20 %, and by around 15 % compared to two baseline models. Our results suggest that fine-tuning a transformer model that is pretrained from scratch with biomedical and general data allows to detect event triggers and identify arguments covering different biomedical subdomains, and therefore improving its generalization. Furthermore, the integration of KG embeddings into the model can significantly improve the performance of biomedical event argument identification, outperforming the results of baseline models.
Natural language tasks like Named Entity Recognition (NER) in the clinical domain on non-English texts can be very time-consuming and expensive due to the lack of annotated data. Cross-lingual transfer (CLT) is a way to circumvent this issue thanks to the ability of multilingual large language models to be fine-tuned on a specific task in one language and to provide high accuracy for the same task in another language. However, other methods leveraging translation models can be used to perform NER without annotated data in the target language, by either translating the training set or test set. This paper compares cross-lingual transfer with these two alternative methods, to perform clinical NER in French and in German without any training data in those languages. To this end, we release MedNERF a medical NER test set extracted from French drug prescriptions and annotated with the same guidelines as an English dataset. Through extensive experiments on this dataset and on a German medical dataset (Frei and Kramer, 2021), we show that translation-based methods can achieve similar performance to CLT but require more care in their design. And while they can take advantage of monolingual clinical language models, those do not guarantee better results than large general-purpose multilingual models, whether with cross-lingual transfer or translation.
Without any explicit cross-lingual training data, multilingual language models can achieve cross-lingual transfer. One common way to improve this transfer is to perform realignment steps before fine-tuning, i.e., to train the model to build similar representations for pairs of words from translated sentences. But such realignment methods were found to not always improve results across languages and tasks, which raises the question of whether aligned representations are truly beneficial for cross-lingual transfer. We provide evidence that alignment is actually significantly correlated with cross-lingual transfer across languages, models and random seeds. We show that fine-tuning can have a significant impact on alignment, depending mainly on the downstream task and the model. Finally, we show that realignment can, in some instances, improve cross-lingual transfer, and we identify conditions in which realignment methods provide significant improvements. Namely, we find that realignment works better on tasks for which alignment is correlated with cross-lingual transfer when generalizing to a distant language and with smaller models, as well as when using a bilingual dictionary rather than FastAlign to extract realignment pairs. For example, for POS-tagging, between English and Arabic, realignment can bring a +15.8 accuracy improvement on distilmBERT, even outperforming XLM-R Large by 1.7. We thus advocate for further research on realignment methods for smaller multilingual models as an alternative to scaling.
We apply Formal Concept Analysis (FCA) to organize and to improve the quality of Démonette2, a French derivational database, through a detection of both missing and spurious derivations in the database. We represent each derivational family as a graph. Given that the subgraph relation exists among derivational families, FCA can group families and represent them in a partially ordered set (poset). This poset is also useful for improving the database. A family is regarded as a possible anomaly (meaning that it may have missing and/or spurious derivations) if its derivational graph is almost, but not completely identical to a large number of other families.
We introduce four tasks designed to determine which sentence encoders best capture discourse properties of sentences from scientific abstracts, namely coherence and cohesion between clauses of a sentence, and discourse relations within sentences. We show that even if contextual encoders such as BERT or SciBERT encodes the coherence in discourse units, they do not help to predict three discourse relations commonly used in scientific abstracts. We discuss what these results underline, namely that these discourse relations are based on particular phrasing that allow non-contextual encoders to perform well.
In this paper, we investigate similarities between discourse and argumentation structures by aligning subtrees in a corpus containing both annotations. Contrary to previous works, we focus on comparing sub-structures and not only relations matches. Using data mining techniques, we show that discourse and argumentation most often align well, and the double annotation allows to derive a mapping between structures. Moreover, this approach enables the study of similarities between discourse structures and differences in their expressive power.
Transfer learning (TL) proposes to enhance machine learning performance on a problem, by reusing labeled data originally designed for a related problem. In particular, domain adaptation consists, for a specific task, in reusing training data developed for the same task but a distinct domain. This is particularly relevant to the applications of deep learning in Natural Language Processing, because those usually require large annotated corpora that may not exist for the targeted domain, but exist for side domains. In this paper, we experiment with TL for the task of Relation Extraction (RE) from biomedical texts, using the TreeLSTM model. We empirically show the impact of TreeLSTM alone and with domain adaptation by obtaining better performances than the state of the art on two biomedical RE tasks and equal performances for two others, for which few annotated data are available. Furthermore, we propose an analysis of the role that syntactic features may play in TL for RE.
Among all researches dedicating to terminology and word sense disambiguation, little attention has been devoted to the ambiguity of term occurrences. If a lexical unit is indeed a term of the domain, it is not true, even in a specialised corpus, that all its occurrences are terminological. Some occurrences are terminological and other are not. Thus, a global decision at the corpus level about the terminological status of all occurrences of a lexical unit would then be erroneous. In this paper, we propose three original methods to characterise the ambiguity of term occurrences in the domain of social sciences for French. These methods differently model the context of the term occurrences: one is relying on text mining, the second is based on textometry, and the last one focuses on text genre properties. The experimental results show the potential of the proposed approaches and give an opportunity to discuss about their hybridisation.
Cet article présente une méthode d’analyse systématique et scientifique des documents constituant un dossier d’instruction. L’objectif de cette approche est de pouvoir donner au juge d’instruction de nouveaux moyens pour évaluer la cohérence, les incohérences, la stabilité ou les variations dans les témoignages. Cela doit lui permettre de définir des pistes pour mener de nouvelles investigations. Nous décrivons les travaux que nous avons réalisés sur un dossier réel puis nous proposons une méthode d’analyse des résultats.