Abstract
Tamarian, a fictional language introduced in the Star Trek episode Darmok, communicates meaning through utterances of metaphorical references, such as “Darmok and Jalad at Tanagra” instead of “We should work together.” This work assembles a Tamarian-English dictionary of utterances from the original episode and several follow-on novels, and uses this to construct a parallel corpus of 456 English-Tamarian utterances. A machine translation system based on a large language model (T5) is trained using this parallel corpus, and is shown to produce an accuracy of 76% when translating from English to Tamarian on known utterances.- Anthology ID:
- 2022.flp-1.5
- Volume:
- Proceedings of the 3rd Workshop on Figurative Language Processing (FLP)
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, United Arab Emirates (Hybrid)
- Editors:
- Debanjan Ghosh, Beata Beigman Klebanov, Smaranda Muresan, Anna Feldman, Soujanya Poria, Tuhin Chakrabarty
- Venue:
- Fig-Lang
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 34–38
- Language:
- URL:
- https://aclanthology.org/2022.flp-1.5
- DOI:
- 10.18653/v1/2022.flp-1.5
- Cite (ACL):
- Peter A. Jansen and Jordan Boyd-Graber. 2022. Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language. In Proceedings of the 3rd Workshop on Figurative Language Processing (FLP), pages 34–38, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
- Cite (Informal):
- Picard understanding Darmok: A Dataset and Model for Metaphor-Rich Translation in a Constructed Language (Jansen & Boyd-Graber, Fig-Lang 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-4/2022.flp-1.5.pdf