A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification
Sosuke Nishikawa, Ikuya Yamada, Yoshimasa Tsuruoka, Isao Echizen
Abstract
We present a multilingual bag-of-entities model that effectively boosts the performance of zero-shot cross-lingual text classification by extending a multilingual pre-trained language model (e.g., M-BERT). It leverages the multilingual nature of Wikidata: entities in multiple languages representing the same concept are defined with a unique identifier. This enables entities described in multiple languages to be represented using shared embeddings. A model trained on entity features in a resource-rich language can thus be directly applied to other languages. Our experimental results on cross-lingual topic classification (using the MLDoc and TED-CLDC datasets) and entity typing (using the SHINRA2020-ML dataset) show that the proposed model consistently outperforms state-of-the-art models.- Anthology ID:
- 2022.conll-1.1
- Volume:
- Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Antske Fokkens, Vivek Srikumar
- Venue:
- CoNLL
- SIG:
- SIGNLL
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–12
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2022.conll-1.1/
- DOI:
- 10.18653/v1/2022.conll-1.1
- Cite (ACL):
- Sosuke Nishikawa, Ikuya Yamada, Yoshimasa Tsuruoka, and Isao Echizen. 2022. A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification. In Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL), pages 1–12, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification (Nishikawa et al., CoNLL 2022)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2022.conll-1.1.pdf