A structurally diverse minimal corpus for eliciting structural mappings between languages

Katharina Probst, Alon Lavie


Abstract
We describe an approach to creating a small but diverse corpus in English that can be used to elicit information about any target language. The focus of the corpus is on structural information. The resulting bilingual corpus can then be used for natural language processing tasks such as inferring transfer mappings for Machine Translation. The corpus is sufficiently small that a bilingual user can translate and word-align it within a matter of hours. We describe how the corpus is created and how its structural diversity is ensured. We then argue that it is not necessary to introduce a large amount of redundancy into the corpus. This is shown by creating an increasingly redundant corpus and observing that the information gained converges as redundancy increases.
Anthology ID:
2004.amta-papers.24
Volume:
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
September 28 - October 2
Year:
2004
Address:
Washington, USA
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
217–226
Language:
URL:
https://link.springer.com/chapter/10.1007/978-3-540-30194-3_24
DOI:
Bibkey:
Cite (ACL):
Katharina Probst and Alon Lavie. 2004. A structurally diverse minimal corpus for eliciting structural mappings between languages. In Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 217–226, Washington, USA. Springer.
Cite (Informal):
A structurally diverse minimal corpus for eliciting structural mappings between languages (Probst & Lavie, AMTA 2004)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/978-3-540-30194-3_24