Abstract
We introduce a light-weight interlingua for a cross-language document retrieval system in the medical domain. It is composed of equivalence classes of semantically primitive, language-specific subwords which are clustered by interlingual and intralingual synonymy. Each subword cluster represents a basic conceptual entity of the language-independent interlingua. Documents, as well as queries, are mapped to this interlingua level on which retrieval operations are performed. Evaluation experiments reveal that this interlingua-based retrieval model outperforms a direct translation approach.- Anthology ID:
- 2005.mtsummit-papers.3
- Volume:
- Proceedings of Machine Translation Summit X: Papers
- Month:
- September 13-15
- Year:
- 2005
- Address:
- Phuket, Thailand
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- 17–24
- Language:
- URL:
- https://aclanthology.org/2005.mtsummit-papers.3
- DOI:
- Cite (ACL):
- Udo Hahn, Kornel Marko, and Stefan Schulz. 2005. Subword Clusters as Light-Weight Interlingua for Multilingual Document Retrieval. In Proceedings of Machine Translation Summit X: Papers, pages 17–24, Phuket, Thailand.
- Cite (Informal):
- Subword Clusters as Light-Weight Interlingua for Multilingual Document Retrieval (Hahn et al., MTSummit 2005)
- PDF:
- https://preview.aclanthology.org/auto-file-uploads/2005.mtsummit-papers.3.pdf