Iker García-Ferrero


2023

pdf
IXA/Cogcomp at SemEval-2023 Task 2: Context-enriched Multilingual Named Entity Recognition Using Knowledge Bases
Iker García-Ferrero | Jon Ander Campos | Oscar Sainz | Ander Salaberria | Dan Roth
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

Named Entity Recognition (NER) is a core natural language processing task in which pre-trained language models have shown remarkable performance. However, standard benchmarks like CoNLL 2003 do not address many of the challenges that deployed NER systems face, such as having to classify emerging or complex entities in a fine-grained way. In this paper we present a novel NER cascade approach comprising three steps: first, identifying candidate entities in the input sentence; second, linking the each candidate to an existing knowledge base; third, predicting the fine-grained category for each entity candidate. We empirically demonstrate the significance of external knowledge bases in accurately classifying fine-grained and emerging entities. Our system exhibits robust performance in the MultiCoNER2 shared task, even in the low-resource language setting where we leverage knowledge bases of high-resource languages.

2022

pdf
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings
Iker García-Ferrero | Rodrigo Agerri | German Rigau
Findings of the Association for Computational Linguistics: EMNLP 2022

Zero-resource cross-lingual transfer approaches aim to apply supervised modelsfrom a source language to unlabelled target languages. In this paper we performan in-depth study of the two main techniques employed so far for cross-lingualzero-resource sequence labelling, based either on data or model transfer.Although previous research has proposed translation and annotation projection(data-based cross-lingual transfer) as an effective technique for cross-lingualsequence labelling, in this paper we experimentally demonstrate that highcapacity multilingual language models applied in a zero-shot (model-basedcross-lingual transfer) setting consistently outperform data-basedcross-lingual transfer approaches. A detailed analysis of our results suggeststhat this might be due to important differences in language use. Morespecifically, machine translation often generates a textual signal which isdifferent to what the models are exposed to when using gold standard data,which affects both the fine-tuning and evaluation processes. Our results alsoindicate that data-based cross-lingual transfer approaches remain a competitiveoption when high-capacity multilingual language models are not available.

2021

pdf
Benchmarking Meta-embeddings: What Works and What Does Not
Iker García-Ferrero | Rodrigo Agerri | German Rigau
Findings of the Association for Computational Linguistics: EMNLP 2021

In the last few years, several methods have been proposed to build meta-embeddings. The general aim was to obtain new representations integrating complementary knowledge from different source pre-trained embeddings thereby improving their overall quality. However, previous meta-embeddings have been evaluated using a variety of methods and datasets, which makes it difficult to draw meaningful conclusions regarding the merits of each approach. In this paper we propose a unified common framework, including both intrinsic and extrinsic tasks, for a fair and objective meta-embeddings evaluation. Furthermore, we present a new method to generate meta-embeddings, outperforming previous work on a large number of intrinsic evaluation benchmarks. Our evaluation framework also allows us to conclude that previous extrinsic evaluations of meta-embeddings have been overestimated.