Víctor Fresno

Also published as: Victor Fresno


2022

pdf
Is anisotropy really the cause of BERT embeddings not being semantic?
Alejandro Fuster Baggetto | Victor Fresno
Findings of the Association for Computational Linguistics: EMNLP 2022

In this paper we conduct a set of experiments aimed to improve our understanding of the lack of semantic isometry in BERT, i.e. the lack of correspondence between the embedding and meaning spaces of its contextualized word representations. Our empirical results show that, contrary to popular belief, the anisotropy is not the root cause of the poor performance of these contextual models’ embeddings in semantic tasks. What does affect both the anisotropy and semantic isometry is a set of known biases: frequency, subword, punctuation, and case. For each one of them, we measure its magnitude and the effect of its removal, showing that these biases contribute but do not completely explain the phenomenon of anisotropy and lack of semantic isometry of these contextual language models.

2014

pdf
A Data Driven Approach for Person Name Disambiguation in Web Search Results
Agustín D. Delgado | Raquel Martínez | Víctor Fresno | Soto Montalvo
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
TweetNorm_es: an annotated corpus for Spanish microtext normalization
Iñaki Alegria | Nora Aranberri | Pere Comas | Víctor Fresno | Pablo Gamallo | Lluis Padró | Iñaki San Vicente | Jordi Turmo | Arkaitz Zubiaga
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this paper we introduce TweetNorm_es, an annotated corpus of tweets in Spanish language, which we make publicly available under the terms of the CC-BY license. This corpus is intended for development and testing of microtext normalization systems. It was created for Tweet-Norm, a tweet normalization workshop and shared task, and is the result of a joint annotation effort from different research groups. In this paper we describe the methodology defined to build the corpus as well as the guidelines followed in the annotation process. We also present a brief overview of the Tweet-Norm shared task, as the first evaluation environment where the corpus was used.

2009

pdf
Is Unlabeled Data Suitable for Multiclass SVM-based Web Page Classification?
Arkaitz Zubiaga | Víctor Fresno | Raquel Martínez
Proceedings of the NAACL HLT 2009 Workshop on Semi-supervised Learning for Natural Language Processing

2006

pdf
Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities
Soto Montalvo | Raquel Martínez | Arantza Casillas | Víctor Fresno
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics