Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms
Guillaume Jacquet, Maud Ehrmann, Ralf Steinberger, Jaakko Väyrynen
Abstract
This paper reports on an approach and experiments to automatically build a cross-lingual multi-word entity resource. Starting from a collection of millions of acronym/expansion pairs for 22 languages where expansion variants were grouped into monolingual clusters, we experiment with several aggregation strategies to link these clusters across languages. Aggregation strategies make use of string similarity distances and translation probabilities and they are based on vector space and graph representations. The accuracy of the approach is evaluated against Wikipedia’s redirection and cross-lingual linking tables. The resulting multi-word entity resource contains 64,000 multi-word entities with unique identifiers and their 600,000 multilingual lexical variants. We intend to make this new resource publicly available.- Anthology ID:
- L16-1084
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 528–535
- Language:
- URL:
- https://aclanthology.org/L16-1084
- DOI:
- Cite (ACL):
- Guillaume Jacquet, Maud Ehrmann, Ralf Steinberger, and Jaakko Väyrynen. 2016. Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 528–535, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Cross-lingual Linking of Multi-word Entities and their corresponding Acronyms (Jacquet et al., LREC 2016)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/L16-1084.pdf