Abstract
In this work, we test the working of Google Translate’s recently introduced Sanskrit-English translation system using a relatively small set of probe test cases designed to focus on those areas that we expect, based on a knowledge of Sanskrit and English grammar, to pose a challenge for translation between Sanskrit and English. We summarize the findings that point to significant gaps in the current Zero-Shot Neural Multilingual Translation (Zero-Shot NMT) approach to Sanskrit-English translation. We then suggest an approach based on Sanskrit grammar to create a differential parallel corpus as a corrective training data to address such gaps. This approach should also generalize to other pairs of languages that have low availability of learning resources, but a good grammar theory.- Anthology ID:
- 2023.clasp-1.16
- Original:
- 2023.clasp-1.16v1
- Version 2:
- 2023.clasp-1.16v2
- Volume:
- Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD)
- Month:
- September
- Year:
- 2023
- Address:
- Gothenburg, Sweden
- Editors:
- Ellen Breitholtz, Shalom Lappin, Sharid Loaiciga, Nikolai Ilinykh, Simon Dobnik
- Venue:
- CLASP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 141–166
- Language:
- URL:
- https://aclanthology.org/2023.clasp-1.16
- DOI:
- Cite (ACL):
- Amit Rao and Kanchi Gopinath. 2023. A Sanskrit grammar-based approach to identify and address gaps in Google Translate’s Sanskrit-English zero-shot NMT. In Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD), pages 141–166, Gothenburg, Sweden. Association for Computational Linguistics.
- Cite (Informal):
- A Sanskrit grammar-based approach to identify and address gaps in Google Translate’s Sanskrit-English zero-shot NMT (Rao & Gopinath, CLASP 2023)
- PDF:
- https://preview.aclanthology.org/ml4al-ingestion/2023.clasp-1.16.pdf