Lluís Padró

Also published as: L. Padro, L. Padró, Lluis Padro, Lluis Padró

2024

pdf abs
Fine-Tuning Open Access LLMs for High-Precision NLU in Goal-Driven Dialog Systems
Lluís Padró | Roser Saurí
Proceedings of the Second International Workshop Towards Digital Language Equality (TDLE): Focusing on Sustainability @ LREC-COLING 2024

This paper presents a set of experiments on fine-tuning LLMs to produce high-precision semantic representations for the NLU component of a dialog system front-end. The aim of this research is threefold: First, we want to explore the capabilities of LLMs on real, industry-based use cases that involve complex data and strict requirements on results. Since the LLM output should usable by the application back-end, the produced semantic representation must satisfy strict format and consistency requirements. Second, we want to evaluate the cost-benefit of open-source LLMs, that is, the feasibility of running this kind of models in machines affordable to small-medium enterprises (SMEs), in order to assess how far this organizations can go without depending on the large players controlling the market, and with a moderate use of computation resources. Finally, we also want to assess the language scalability of the LLMs in this kind of applications; specifically, whether a multilingual model is able to cast patterns learnt from one language to other ones –with special attention to underresourced languages–, thus reducing required training data and computation costs. This work was carried out within an R&D context of assisting a real company in defining its NLU model strategy, and thus the results have a practical, industry-level focus.

2023

pdf abs
Women Wearing Lipstick: Measuring the Bias Between an Object and Its Related Gender
Ahmed Sabir | Lluís Padró
Findings of the Association for Computational Linguistics: EMNLP 2023

In this paper, we investigate the impact of objects on gender bias in image captioning systems. Our results show that only gender-specific objects have a strong gender bias (e.g., women-lipstick). In addition, we propose a visual semantic-based gender score that measures the degree of bias and can be used as a plug-in for any image captioning system. Our experiments demonstrate the utility of the gender score, since we observe that our score can measure the bias relation between a caption and its related gender; therefore, our score can be used as an additional metric to the existing Object Gender Co-Occ approach.

2022

pdf abs
Belief Revision Based Caption Re-ranker with Visual Semantic Information
Ahmed Sabir | Francesc Moreno-Noguer | Pranava Madhyastha | Lluís Padró
Proceedings of the 29th International Conference on Computational Linguistics

In this work, we focus on improving the captions generated by image-caption generation systems. We propose a novel re-ranking approach that leverages visual-semantic measures to identify the ideal caption that maximally captures the visual information in the image. Our re-ranker utilizes the Belief Revision framework (Blok et al., 2003) to calibrate the original likelihood of the top-n captions by explicitly exploiting semantic relatedness between the depicted caption and the visual context. Our experiments demonstrate the utility of our approach, where we observe that our re-ranker can enhance the performance of a typical image-captioning system without necessity of any additional training or fine-tuning.

2019

pdf abs
Semantic Relatedness Based Re-ranker for Text Spotting
Ahmed Sabir | Francesc Moreno | Lluís Padró
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Applications such as textual entailment, plagiarism detection or document clustering rely on the notion of semantic similarity, and are usually approached with dimension reduction techniques like LDA or with embedding-based neural approaches. We present a scenario where semantic similarity is not enough, and we devise a neural approach to learn semantic relatedness. The scenario is text spotting in the wild, where a text in an image (e.g. street sign, advertisement or bus destination) must be identified and recognized. Our goal is to improve the performance of vision systems by leveraging semantic information. Our rationale is that the text to be spotted is often related to the image context in which it appears (word pairs such as Delta-airplane, or quarters-parking are not similar, but are clearly related). We show how learning a word-to-word or word-to-sentence relatedness score can improve the performance of text spotting systems up to 2.9 points, outperforming other measures in a benchmark dataset.

2018

pdf
Coreference Resolution in FreeLing 4.0
Montserrat Marimon | Lluís Padró | Jordi Turmo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf abs
Challenges and Opportunities of Applying Natural Language Processing in Business Process Management
Han van der Aa | Josep Carmona | Henrik Leopold | Jan Mendling | Lluís Padró
Proceedings of the 27th International Conference on Computational Linguistics

The Business Process Management (BPM) field focuses in the coordination of labor so that organizational processes are smoothly executed in a way that products and services are properly delivered. At the same time, NLP has reached a maturity level that enables its widespread application in many contexts, thanks to publicly available frameworks. In this position paper, we show how NLP has potential in raising the benefits of BPM practices at different levels. Instead of being exhaustive, we show selected key challenges were a successful application of NLP techniques would facilitate the automation of particular tasks that nowadays require a significant effort to accomplish. Finally, we report on applications that consider both the process perspective and its enhancement through NLP.

2017

pdf abs
Morphological Analysis of the Dravidian Language Family
Arun Kumar | Ryan Cotterell | Lluís Padró | Antoni Oliver
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

The Dravidian languages are one of the most widely spoken language families in the world, yet there are very few annotated resources available to NLP researchers. To remedy this, we create DravMorph, a corpus annotated for morphological segmentation and part-of-speech. Additionally, we exploit novel features and higher-order models to set state-of-the-art results on these corpora on both tasks, beating techniques proposed in the literature by as much as 4 points in segmentation F1.

2015

pdf
Enhancing FreeLing Rule-Based Dependency Grammars with Subcategorization Frames
Marina Lloberes | Irene Castellón | Lluís Padró
Proceedings of the Third International Conference on Dependency Linguistics (Depling 2015)

pdf
Suitability of ParTes Test Suite for Parsing Evaluation
Marina Lloberes | Irene Castellón | Lluís Padró
Proceedings of the 14th International Conference on Parsing Technologies

pdf
Joint Bayesian Morphology Learning for Dravidian Languages
Arun Kumar | Lluís Padró | Antoni Oliver
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects

pdf
Learning Agglutinative Morphology of Indian Languages with Linguistically Motivated Adaptor Grammars
Arun Kumar | Lluís Padró | Antoni Oliver
Proceedings of the International Conference Recent Advances in Natural Language Processing

2014

In this paper we introduce TweetNorm_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.

This paper presents the linguistic analysis tools and its infrastructure developed within the XLike project. The main goal of the implemented tools is to provide a set of functionalities for supporting some of the main objectives of XLike, such as enabling cross-lingual services for publishers, media monitoring or developing new business intelligence applications. The services cover seven major and minor languages: English, German, Spanish, Chinese, Catalan, Slovenian, and Croatian. These analyzers are provided as web services following a lightweight SOA architecture approach, and they are publically callable and are catalogued in META-SHARE.

pdf bib
Squibs: Automatic Selection of HPSG-Parsed Sentences for Treebank Construction
Montserrat Marimon | Núria Bel | Lluís Padró
Computational Linguistics, Volume 40, Issue 3 - September 2014

2013

pdf
A Constraint-Based Hypergraph Partitioning Approach to Coreference Resolution
Emili Sapena | Lluís Padró | Jordi Turmo
Computational Linguistics, Volume 39, Issue 4 - December 2013

2012

pdf abs
Highlighting relevant concepts from Topic Signatures
Montse Cuadros | Lluís Padró | German Rigau
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper presents deepKnowNet, a new fully automatic method for building highly dense and accurate knowledge bases from existing semantic resources. Basically, the method applies a knowledge-based Word Sense Disambiguation algorithm to assign the most appropriate WordNet sense to large sets of topically related words acquired from the web, named TSWEB. This Word Sense Disambiguation algorithm is the personalized PageRank algorithm implemented in UKB. This new method improves by automatic means the current content of WordNet by creating large volumes of new and accurate semantic relations between synsets. KnowNet was our first attempt towards the acquisition of large volumes of semantic relations. However, KnowNet had some limitations that have been overcomed with deepKnowNet. deepKnowNet disambiguates the first hundred words of all Topic Signatures from the web (TSWEB). In this case, the method highlights the most relevant word senses of each Topic Signature and filter out the ones that are not so related to the topic. In fact, the knowledge it contains outperforms any other resource when is empirically evaluated in a common framework based on a similarity task annotated with human judgements.

pdf abs
FreeLing 3.0: Towards Wider Multilinguality
Lluís Padró | Evgeny Stanilovsky
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

FreeLing is an open-source multilingual language processing library providing a wide range of analyzers for several languages. It offers text processing and language annotation facilities to NLP application developers, lowering the cost of building those applications. FreeLing is customizable, extensible, and has a strong orientation to real-world applications in terms of speed and robustness. Developers can use the default linguistic resources (dictionaries, lexicons, grammars, etc.), extend/adapt them to specific domains, or --since the library is open source-- develop new ones for specific languages or special application needs. This paper describes the general architecture of the library, presents the major changes and improvements included in FreeLing version 3.0, and summarizes some relevant industrial projects in which it has been used.

2011

pdf bib
FreeLing: open-source natural language processing for research and development
Lluís Padró
Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation

pdf bib
Extending the tool, or how to annotate historical language varieties
Cristina Sánchez-Marco | Gemma Boleda | Lluís Padró
Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities

pdf
RelaxCor Participation in CoNLL Shared Task on Coreference Resolution
Emili Sapena | Lluís Padró | Jordi Turmo
Proceedings of the Fifteenth Conference on Computational Natural Language Learning: Shared Task

2010

pdf bib abs
FreeLing 2.1: Five Years of Open-source Language Processing Tools
Lluís Padró | Miquel Collado | Samuel Reese | Marina Lloberes | Irene Castellón
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

FreeLing is an open-source multilingual language processing library providing a wide range of language analyzers for several languages. It offers text processing and language annotation facilities to natural language processing application developers, simplifying the task of building those applications. FreeLing is customizable and extensible. Developers can use the default linguistic resources (dictionaries, lexicons, grammars, etc.) directly, or extend them, adapt them to specific domains, or even develop new ones for specific languages. This paper overviews the recent history of this tool, summarizes the improvements and extensions incorporated in the latest version, and depicts the architecture of the library. Special focus is brought to the fact and consequences of the library being open-source: After five years and over 35,000 downloads, a growing user community has extended the initial threelanguages (English, Spanish and Catalan) to eight (adding Galician, Italian, Welsh, Portuguese, and Asturian), proving that the collaborative open model is a productive approach for the development of NLP tools and resources.

pdf abs
Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia Corpus
Samuel Reese | Gemma Boleda | Montse Cuadros | Lluís Padró | German Rigau
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This article presents a new freely available trilingual corpus (Catalan, Spanish, English) that contains large portions of the Wikipedia and has been automatically enriched with linguistic information. To our knowledge, this is the largest such corpus that is freely available to the community: In its present version, it contains over 750 million words. The corpora have been annotated with lemma and part of speech information using the open source library FreeLing. Also, they have been sense annotated with the state of the art Word Sense Disambiguation algorithm UKB. As UKB assigns WordNet senses, and WordNet has been aligned across languages via the InterLingual Index, this sort of annotation opens the way to massive explorations in lexical semantics that were not possible before. We present a first attempt at creating a trilingual lexical resource from the sense-tagged Wikipedia corpora, namely, WikiNet. Moreover, we present two by-products of the project that are of use for the NLP community: An open source Java-based parser for Wikipedia pages developed for the construction of the corpus, and the integration of the WSD algorithm UKB in FreeLing.

pdf abs
Spanish FreeLing Dependency Grammar
Marina Lloberes | Irene Castellón | Lluís Padró
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper presents the development of an open-source Spanish Dependency Grammar implemented in FreeLing environment. This grammar was designed as a resource for NLP applications that require a step further in natural language automatic analysis, as is the case of Spanish-to-Basque translation. The development of wide-coverage rule-based grammars using linguistic knowledge contributes to extend the existing Spanish deep parsers collection, which sometimes is limited. Spanish FreeLing Dependency Grammar, named EsTxala, provides deep and robust parse trees, solving attachments for any structure and assigning syntactic functions to dependencies. These steps are dealt with hand-written rules based on linguistic knowledge. As a result, FreeLing Dependency Parser gives a unique analysis as a dependency tree for each sentence analyzed. Since it is a resource open to the scientific community, exhaustive grammar evaluation is being done to determine its accuracy as well as strategies for its manteinance and improvement. In this paper, we show the results of an experimental evaluation carried out over EsTxala in order to test our evaluation methodology.

In this paper, we present a brief snapshot of the state of affairs in computational processing of Catalan and the initiatives that are starting to take place in an effort to bring the field a step forward, by making a better and more efficient use of the already existing resources and tools, by bridging the gap between research and market, and by establishing periodical meeting points for the community. In particular, we present the results of the First Workshop on the Computational Processing of Catalan, which succeeded in putting together a fair representation of the research in the area, and received attention from both the industry and the administration. Aside from facilitating communication among researchers and between developers and users, the Workshop provided the organizers with valuable information about existing resources, tools, developers and providers. This information has allowed us to go a step further by setting up a harvesting procedure which will hopefully build the seed of a portal-catalogue-observatory of language resources and technologies in Catalan.

pdf
A Global Relaxation Labeling Approach to Coreference Resolution
Emili Sapena | Lluís Padró | Jordi Turmo
Coling 2010: Posters

pdf
RelaxCor: A Global Relaxation Labeling Approach to Coreference Resolution
Emili Sapena | Lluís Padró | Jordi Turmo
Proceedings of the 5th International Workshop on Semantic Evaluation

2007

pdf
UPC: Experiments with Joint Learning within SemEval Task 9
Lluís Màrquez | Lluís Padró | Mihai Surdeanu | Luis Villarejo
Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007)

2006

This paper describes version 1.3 of the FreeLing suite of NLP tools. FreeLing was first released in February 2004 providing morphological analysis and PoS tagging for Catalan, Spanish, and English. From then on, the package has been improved and enlarged to cover more languages (i.e. Italian and Galician) and offer more services: Named entity recognition and classification, chunking, dependency parsing, and WordNet based semantic annotation. FreeLing is not conceived as end-user oriented tool, but as library on top of which powerful NLP applications can be developed. Nevertheless, sample interface programs are provided, which can be straightforwardly used as fast, flexible, and efficient corpus processing tools. A remarkable feature of FreeLing is that it is distributed under a free-software LGPL license, thus enabling any developer to adapt the package to his needs in order to get the most suitable behaviour for the application being developed.

2005

We present the current status of development of an open architecture for the translation from Spanish into Basque. The machine translation architecture uses an open source analyser for Spanish and new modules mainly based on finite-state transducers. The project is integrated in the OpenTrad initiative, a larger government funded project shared among different universities and small companies, which will also include MT engines for translation among the main languages in Spain. The main objective is the construction of an open, reusable and interoperable framework. This paper describes the design of the engine, the formats it uses for the communication among the modules, the modules reused from other project named Matxin and the new modules we are building.

2004

pdf
Knowledge intensive e-mail summarization in CARPANTA
Laura Alonso | Irene Castellón | Bernardino Casas | Lluís Padró
Proceedings of the ACL Interactive Poster and Demonstration Sessions

pdf
FreeLing: An Open-Source Suite of Language Analyzers
Xavier Carreras | Isaac Chao | Lluís Padró | Muntsa Padró
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf abs
Multiple Sequence Alignment for Characterizing the Lineal Structure of Revision
Laura Alonso | Irene Castellón | Jordi Escribano | Xavier Messeguer | Lluís Padró
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

We present a first approach to the application of a data mining technique, Multiple Sequence Alignment, to the systematization of a polemic aspect of discourse, namely, the expression of contrast, concession, counterargument and semantically similar discursive relations. The representation of the phenomena under study is carried out by very simple techniques, mostly pattern-matching, but the results allow to drive insightful conclusions on the organization of this aspect of discourse: equivalence classes of discourse markers are established, and systematic patterns are discovered, which will be applied in enhancing a discursive parser.

2003

pdf
A Simple Named Entity Extractor using AdaBoost
Xavier Carreras | Lluís Màrquez | Lluís Padró
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

pdf
Learning a Perceptron-Based Named Entity Chunker via Online Recognition Feedback
Xavier Carreras | Lluís Màrquez | Lluís Padró
Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003

pdf
Low-cost Named Entity Classification for Catalan: Exploiting Multilingual Resources and Unlabeled Data
Lluís Màrquez | Adrià de Gispert | Xavier Carreras | Lluís Padró
Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition

pdf
Named Entity Recognition For Catalan Using Only Spanish Resources and Unlabelled Data
Xavier Carreras | Lluís Màrquez | Lluís Padró
10th Conference of the European Chapter of the Association for Computational Linguistics

2002

pdf
A Flexible Distributed Architecture for Natural Language Analyzers
Xavier Carreras | Lluís Padró
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf
Named Entity Extraction using AdaBoost
Xavier Carreras | Lluís Màrquez | Lluís Padró
COLING-02: The 6th Conference on Natural Language Learning 2002 (CoNLL-2002)

2001