Abstract
We present ParaNames, a Wikidata-derived multilingual parallel name resource consisting of names for approximately 14 million entities spanning over 400 languages. ParaNames is useful for multilingual language processing, both in defining tasks for name translation tasks and as supplementary data for other tasks. We demonstrate an application of ParaNames by training a multilingual model for canonical name translation to and from English.- Anthology ID:
- 2022.sigtyp-1.15
- Volume:
- Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
- Month:
- July
- Year:
- 2022
- Address:
- Seattle, Washington
- Editors:
- Ekaterina Vylomova, Edoardo Ponti, Ryan Cotterell
- Venue:
- SIGTYP
- SIG:
- SIGTYP
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 103–105
- Language:
- URL:
- https://preview.aclanthology.org/icon-24-ingestion/2022.sigtyp-1.15/
- DOI:
- 10.18653/v1/2022.sigtyp-1.15
- Cite (ACL):
- Jonne Sälevä and Constantine Lignos. 2022. ParaNames: A Massively Multilingual Entity Name Corpus. In Proceedings of the 4th Workshop on Research in Computational Linguistic Typology and Multilingual NLP, pages 103–105, Seattle, Washington. Association for Computational Linguistics.
- Cite (Informal):
- ParaNames: A Massively Multilingual Entity Name Corpus (Sälevä & Lignos, SIGTYP 2022)
- PDF:
- https://preview.aclanthology.org/icon-24-ingestion/2022.sigtyp-1.15.pdf
- Code
- bltlab/paranames