UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer
Abdellah El Mekki, Abdelkader El Mahdaouy, Mohammed Akallouch, Ismail Berrada, Ahmed Khoumsi
Abstract
Building real-world complex Named Entity Recognition (NER) systems is a challenging task. This is due to the complexity and ambiguity of named entities that appear in various contexts such as short input sentences, emerging entities, and complex entities. Besides, real-world queries are mostly malformed, as they can be code-mixed or multilingual, among other scenarios. In this paper, we introduce our submitted system to the Multilingual Complex Named Entity Recognition (MultiCoNER) shared task. We approach the complex NER for multilingual and code-mixed queries, by relying on the contextualized representation provided by the multilingual Transformer XLM-RoBERTa. In addition to the CRF-based token classification layer, we incorporate a span classification loss to recognize named entities spans. Furthermore, we use a self-training mechanism to generate weakly-annotated data from a large unlabeled dataset. Our proposed system is ranked 6th and 8th in the multilingual and code-mixed MultiCoNER’s tracks respectively.- Anthology ID:
- 2022.semeval-1.207
- Volume:
- Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, United States
- Editors:
- Guy Emerson, Natalie Schluter, Gabriel Stanovsky, Ritesh Kumar, Alexis Palmer, Nathan Schneider, Siddharth Singh, Shyam Ratan
- Venue:
- SemEval
- SIG:
- SIGLEX
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1511–1517
- Language:
- URL:
- https://aclanthology.org/2022.semeval-1.207
- DOI:
- 10.18653/v1/2022.semeval-1.207
- Cite (ACL):
- Abdellah El Mekki, Abdelkader El Mahdaouy, Mohammed Akallouch, Ismail Berrada, and Ahmed Khoumsi. 2022. UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer. In Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022), pages 1511–1517, Seattle, United States. Association for Computational Linguistics.
- Cite (Informal):
- UM6P-CS at SemEval-2022 Task 11: Enhancing Multilingual and Code-Mixed Complex Named Entity Recognition via Pseudo Labels using Multilingual Transformer (El Mekki et al., SemEval 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-5/2022.semeval-1.207.pdf
- Data
- MultiCoNER