Ricardo Baeza-Yates


2025

pdf bib
Graph-Linguistic Fusion: Using Language Models for Wikidata Vandalism Detection
Mykola Trokhymovych | Lydia Pintscher | Ricardo Baeza-Yates | Diego Sáez Trumper
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

We introduce a next-generation vandalism detection system for Wikidata, one of the largest open-source structured knowledge bases on the Web. Wikidata is highly complex: its items incorporate an ever-expanding universe of factual triples and multilingual texts. While edits can alter both structured and textual content, our approach converts all edits into a single space using a method we call Graph2Text. This allows for evaluating all content changes for potential vandalism using a single multilingual language model. This unified approach improves coverage and simplifies maintenance. Experiments demonstrate that our solution outperforms the current production system. Additionally, we are releasing the code under an open license along with a large dataset of various human-generated knowledge alterations, enabling further research.

2016

pdf bib
CASSAurus: A Resource of Simpler Spanish Synonyms
Ricardo Baeza-Yates | Luz Rello | Julia Dembowski
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this work we introduce and describe a language resource composed of lists of simpler synonyms for Spanish. The synonyms are divided in different senses taken from the Spanish OpenThesaurus, where context disambiguation was performed by using statistical information from the Web and Google Books Ngrams. This resource is freely available online and can be used for different NLP tasks such as lexical simplification. Indeed, so far it has been already integrated into four tools.

2015

pdf bib
CASSA: A Context-Aware Synonym Simplification Algorithm
Ricardo Baeza-Yates | Luz Rello | Julia Dembowski
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf bib
DysList: An Annotated Resource of Dyslexic Errors
Luz Rello | Ricardo Baeza-Yates | Joaquim Llisterri
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We introduce a language resource for Spanish, DysList, composed of a list of unique errors extracted from a collection of texts written by people with dyslexia. Each of the errors was annotated with a set of characteristics as well as visual and phonetic features. To the best of our knowledge this is the largest resource of this kind, especially given the difficulty of finding texts written by people with dyslexia

pdf bib
Keyword Highlighting Improves Comprehension for People with Dyslexia
Luz Rello | Horacio Saggion | Ricardo Baeza-Yates
Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)

2013

pdf bib
Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility
Luz Rello | Horacio Saggion | Ricardo Baeza-Yates
Proceedings of the Workshop on Natural Language Processing for Improving Textual Accessibility

2012

pdf bib
Elliphant: Improved Automatic Detection of Zero Subjects and Impersonal Constructions in Spanish
Luz Rello | Ricardo Baeza-Yates | Ruslan Mitkov
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

pdf bib
Graphical Schemes May Improve Readability but Not Understandability for People with Dyslexia
Luz Rello | Horacio Saggion | Ricardo Baeza-Yates | Eduardo Graells
Proceedings of the First Workshop on Predicting and Improving Text Readability for target reader populations