Recycling annotated parallel corpora for bilingual document composition

Arantza Casillas, Joseba Abaitua, Raquel Martínez


Abstract
Parallel corpora enriched with descriptive annotations facilitate multilingual authoring development. Departing from an annotated bitext we show how SGML markup can be recycled to produce complementary language resources. On the one hand, several translation memory databases together with glossaries of proper nouns have been produced. On the other, DTDs for source and target documents have been derived and put into correspondence. This paper discusses how these resources have been automatically generated and applied to an interactive bilingual authoring system. This tool is capable of handling a substantial proportion of text both in the composition and translation of structured documents.
Anthology ID:
2000.amta-papers.12
Volume:
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers
Month:
October 10-14
Year:
2000
Address:
Cuernavaca, Mexico
Venue:
AMTA
SIG:
Publisher:
Springer
Note:
Pages:
117–126
Language:
URL:
https://link.springer.com/chapter/10.1007/3-540-39965-8_12
DOI:
Bibkey:
Cite (ACL):
Arantza Casillas, Joseba Abaitua, and Raquel Martínez. 2000. Recycling annotated parallel corpora for bilingual document composition. In Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers, pages 117–126, Cuernavaca, Mexico. Springer.
Cite (Informal):
Recycling annotated parallel corpora for bilingual document composition (Casillas et al., AMTA 2000)
Copy Citation:
PDF:
https://link.springer.com/chapter/10.1007/3-540-39965-8_12