Cătălina Goanță

Also published as: Catalina Goanta

2023

pdf abs
LeXFiles and LegalLAMA: Facilitating English Multinational Legal Language Model Development
Ilias Chalkidis | Nicolas Garneau | Catalina Goanta | Daniel Katz | Anders Søgaard
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In this work, we conduct a detailed analysis on the performance of legal-oriented pre-trained language models (PLMs). We examine the interplay between their original objective, acquired knowledge, and legal language understanding capacities which we define as the upstream, probing, and downstream performance, respectively. We consider not only the models’ size but also the pre-training corpora used as important dimensions in our study. To this end, we release a multinational English legal corpus (LeXFiles) and a legal knowledge probing benchmark (LegalLAMA) to facilitate training and detailed analysis of legal-oriented PLMs. We release two new legal PLMs trained on LeXFiles and evaluate them alongside others on LegalLAMA and LexGLUE. We find that probing performance strongly correlates with upstream performance in related legal topics. On the other hand, downstream performance is mainly driven by the model’s size and prior legal knowledge which can be estimated by upstream and probing performance. Based on these findings, we can conclude that both dimensions are important for those seeking the development of domain-specific PLMs.

2022

pdf abs
A Cancel Culture Corpus through the Lens of Natural Language Processing
Justus-Jonas Erker | Catalina Goanta | Gerasimos Spanakis
Proceedings of the First Workshop on Language Technology and Resources for a Fair, Inclusive, and Safe Society within the 13th Language Resources and Evaluation Conference

Cancel Culture as an Internet phenomenon has been previously explored from a social and legal science perspective. This paper demonstrates how Natural Language Processing tasks can be derived from this previous work, underlying techniques on how cancel culture can be measured, identified and evaluated. As part of this paper, we introduce a first cancel culture data set with of over 2.3 million tweets and a framework to enlarge it further. We provide a detailed analysis of this data set and propose a set of features, based on various models including sentiment analysis and emotion detection that can help characterizing cancel culture.

pdf bib
Proceedings of the Natural Legal Language Processing Workshop 2022
Nikolaos Aletras | Ilias Chalkidis | Leslie Barrett | Cătălina Goanță | Daniel Preoțiuc-Pietro
Proceedings of the Natural Legal Language Processing Workshop 2022