Subword Clusters as Light-Weight Interlingua for Multilingual Document Retrieval

Udo Hahn, Kornel Marko, Stefan Schulz


Abstract
We introduce a light-weight interlingua for a cross-language document retrieval system in the medical domain. It is composed of equivalence classes of semantically primitive, language-specific subwords which are clustered by interlingual and intralingual synonymy. Each subword cluster represents a basic conceptual entity of the language-independent interlingua. Documents, as well as queries, are mapped to this interlingua level on which retrieval operations are performed. Evaluation experiments reveal that this interlingua-based retrieval model outperforms a direct translation approach.
Anthology ID:
2005.mtsummit-papers.3
Volume:
Proceedings of Machine Translation Summit X: Papers
Month:
September 13-15
Year:
2005
Address:
Phuket, Thailand
Venue:
MTSummit
SIG:
Publisher:
Note:
Pages:
17–24
Language:
URL:
https://aclanthology.org/2005.mtsummit-papers.3
DOI:
Bibkey:
Cite (ACL):
Udo Hahn, Kornel Marko, and Stefan Schulz. 2005. Subword Clusters as Light-Weight Interlingua for Multilingual Document Retrieval. In Proceedings of Machine Translation Summit X: Papers, pages 17–24, Phuket, Thailand.
Cite (Informal):
Subword Clusters as Light-Weight Interlingua for Multilingual Document Retrieval (Hahn et al., MTSummit 2005)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2005.mtsummit-papers.3.pdf