Laura Alonso Alemany

Also published as: Laura Alonso, Laura Alonso i Alemany


2022

pdf
RoBERTuito: a pre-trained language model for social media text in Spanish
Juan Manuel Pérez | Damián Ariel Furman | Laura Alonso Alemany | Franco M. Luque
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Since BERT appeared, Transformer language models and transfer learning have become state-of-the-art for natural language processing tasks. Recently, some works geared towards pre-training specially-crafted models for particular domains, such as scientific papers, medical documents, user-generated texts, among others. These domain-specific models have been shown to improve performance significantly in most tasks; however, for languages other than English, such models are not widely available. In this work, we present RoBERTuito, a pre-trained language model for user-generated text in Spanish, trained on over 500 million tweets. Experiments on a benchmark of tasks involving user-generated text showed that RoBERTuito outperformed other pre-trained language models in Spanish. In addition to this, our model has some cross-lingual abilities, achieving top results for English-Spanish tasks of the Linguistic Code-Switching Evaluation benchmark (LinCE) and also competitive performance against monolingual models in English Twitter tasks. To facilitate further research, we make RoBERTuito publicly available at the HuggingFace model hub together with the dataset used to pre-train it.

2018

pdf
Increasing Argument Annotation Reproducibility by Using Inter-annotator Agreement to Improve Guidelines
Milagro Teruel | Cristian Cardellino | Fernando Cardellino | Laura Alonso Alemany | Serena Villata
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2017

pdf
As bases de dados verbais ADESSE e ViPEr: uma análise constrastiva das construções locativas em espanhol e em português (The verbal databases ADESSE and ViPEr: a contrastive analysis of locative constructs in Spanish and Portuguese)[In Portuguese]
Roana Rodrigues | Oto Vale | Laura Alonso Alemany
Proceedings of the 11th Brazilian Symposium in Information and Human Language Technology

pdf
Legal NERC with ontologies, Wikipedia and curriculum learning
Cristian Cardellino | Milagro Teruel | Laura Alonso Alemany | Serena Villata
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

In this paper, we present a Wikipedia-based approach to develop resources for the legal domain. We establish a mapping between a legal domain ontology, LKIF (Hoekstra et al. 2007), and a Wikipedia-based ontology, YAGO (Suchanek et al. 2007), and through that we populate LKIF. Moreover, we use the mentions of those entities in Wikipedia text to train a specific Named Entity Recognizer and Classifier. We find that this classifier works well in the Wikipedia, but, as could be expected, performance decreases in a corpus of judgments of the European Court of Human Rights. However, this tool will be used as a preprocess for human annotation. We resort to a technique called “curriculum learning” aimed to overcome problems of overfitting by learning increasingly more complex concepts. However, we find that in this particular setting, the method works best by learning from most specific to most general concepts, not the other way round.

2010

pdf bib
Data-driven computational linguistics at FaMAF-UNC, Argentina
Laura Alonso Alemany | Gabriel Infante-Lopez
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

pdf
IRASubcat, a highly parametrizable, language independent tool for the acquisition of verbal subcategorization information from corpus
Ivana Romina Altamirano | Laura Alonso Alemany
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

2006

pdf
The Sensem Corpus: a Corpus Annotated at the Syntactic and Semantic Level
Irene Castellón | Ana Fernández-Montraveta | Gloria Vázquez | Laura Alonso Alemany | Joan Antoni Capilla
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The primary aim of the project SENSEM (Sentence Semantics, BFF2003-06456) is the construction of a Lexical Data Base illustrating the syntactic and semantic behavior of each of the senses of the 250 most frequent verbs of Spanish. With this objective in mind, we are currently building an annotated corpus consisting of sentences extracted from the electronic version of the newspaper El Periódico de Catalunya, totalling approximately 1 million words, with 100 examples of each verb. By the time of the conference, we will be about to complete the annotation of 25,000 sentences, which means roughly a corpus of 800,000 words. Approximately 400,000 of them will have been revised. We expect to make the corpus publicly available by the end of 2006.

2004

pdf
Multiple Sequence Alignment for Characterizing the Lineal Structure of Revision
Laura Alonso | Irene Castellón | Jordi Escribano | Xavier Messeguer | Lluís Padró
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

We present a first approach to the application of a data mining technique, Multiple Sequence Alignment, to the systematization of a polemic aspect of discourse, namely, the expression of contrast, concession, counterargument and semantically similar discursive relations. The representation of the phenomena under study is carried out by very simple techniques, mostly pattern-matching, but the results allow to drive insightful conclusions on the organization of this aspect of discourse: equivalence classes of discourse markers are established, and systematic patterns are discovered, which will be applied in enhancing a discursive parser.

pdf
Re-using High-quality Resources for Continued Evaluation of Automated Summarization Systems
Laura Alonso | Maria Fuentes | Marc Massot | Horacio Rodríguez
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
Semantic Categorization of Spanish Se-constructions
Glòria Vázquez | Ana Fernández Montraveta | Irene Castellón | Laura Alonso
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf bib
A Framework for Feature based Description of Low level Discourse
Laura Alonso Alemany | Ezequiel Andujar Hinojosa | Robert Sola Salvatierra
Proceedings of the Workshop on Discourse Annotation

pdf
Knowledge intensive e-mail summarization in CARPANTA
Laura Alonso | Irene Castellón | Bernardino Casas | Lluís Padró
Proceedings of the ACL Interactive Poster and Demonstration Sessions

2003

pdf bib
Cohesion and coherence for Automatic Summarization
Laura Alonso i Alemany | Maria Fuentes Fort
Student Research Workshop

pdf
Clustering Adjectives for Class Discovery
Gemma Boleda Torrent | Laura Alonso i Alemany
Student Research Workshop