Víctor Fresno

Also published as: Victor Fresno

2022

pdf abs
Is anisotropy really the cause of BERT embeddings not being semantic?
Alejandro Fuster Baggetto | Victor Fresno
Findings of the Association for Computational Linguistics: EMNLP 2022

In this paper we conduct a set of experiments aimed to improve our understanding of the lack of semantic isometry in BERT, i.e. the lack of correspondence between the embedding and meaning spaces of its contextualized word representations. Our empirical results show that, contrary to popular belief, the anisotropy is not the root cause of the poor performance of these contextual models’ embeddings in semantic tasks. What does affect both the anisotropy and semantic isometry is a set of known biases: frequency, subword, punctuation, and case. For each one of them, we measure its magnitude and the effect of its removal, showing that these biases contribute but do not completely explain the phenomenon of anisotropy and lack of semantic isometry of these contextual language models.

2014

pdf
A Data Driven Approach for Person Name Disambiguation in Web Search Results
Agustín D. Delgado | Raquel Martínez | Víctor Fresno | Soto Montalvo
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

In this paper we introduce TweetNorm_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.

Víctor Fresno

2022

2014

2009

2006

Co-authors

Venues