Louis Estève
2024
Vector Spaces for Quantifying Disparity of Multiword Expressions in Annotated Text
Louis Estève
|
Agata Savary
|
Thomas Lavergne
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Multiword Expressions (MWEs) make a goodcase study for linguistic diversity due to their idiosyncratic nature. Defining MWE canonical forms as types, diversity may be measured notably through disparity, based on pairwise distances between types. To this aim, we train static MWE-aware word embeddings for verbal MWEs in 14 languages, and we show interesting properties of these vector spaces. We use these vector spaces to implement the so-called functional diversity measure. We apply this measure to the results of several MWE identification systems. We find that, although MWE vector spaces are meaningful ata local scale, the disparity measure aggregatingthem at a global scale strongly correlates with the number of types, which questions its usefulness in presence of simpler diversity metrics such as variety. We make the vector spaces we generated available.
2023
UO-LouTAL at SemEval-2023 Task 6: Lightweight Systems for Legal Processing
Sébastien Bosch
|
Louis Estève
|
Joanne Loo
|
Anne-Lyse Minard
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
This paper presents the work produced by students of the University of Orlans Masters in Natural Language Processing program by way of participating in SemEval Task 6, LegalEval, which aims to enhance the capabilities of legal professionals through automated systems. Two out of the three sub-tasks available – Rhetorical Role prediction (RR) and Legal Named Entity Recognition (L-NER) – were tackled, with the express intent of developing lightweight and interpretable systems. For the L-NER sub-task, a CRF model was trained, augmented with post-processing rules for some named entity types. A macro F1 score of 0.74 was obtained on the DEV set, and 0.64 on the evaluation set. As for the RR sub-task, two sentence classification systems were built: one based on the Bag-of-Words technique with L-NER system output integrated, the other using a sentence-transformer approach. Rule-based post-processing then converted the results of the sentence classification systems into RR predictions. The better-performing Bag-of-Words system obtained a macro F1 score of 0.49 on the DEV set and 0.57 on the evaluation set.
Search