Abstract
Simultaneous interpretation is a very challenging application of machine translation in which the input is a stream of words from a speech recognition engine. The key problem is how to segment the stream in an online manner into units suitable for translation. The segmentation process proceeds by calculating a confidence score for each word that indicates the soundness of placing a sentence boundary after it, and then heuristics are employed to determine the position of the boundaries. Multiple variants of the confidence scoring method and segmentation heuristics were studied. Experimental results show that the best performing strategy is not only efficient in terms of average latency per word, but also achieved end-to-end translation quality close to an offline baseline, and close to oracle segmentation.- Anthology ID:
- W16-4613
- Volume:
- Proceedings of the 3rd Workshop on Asian Translation (WAT2016)
- Month:
- December
- Year:
- 2016
- Address:
- Osaka, Japan
- Editors:
- Toshiaki Nakazawa, Hideya Mino, Chenchen Ding, Isao Goto, Graham Neubig, Sadao Kurohashi, Ir. Hammam Riza, Pushpak Bhattacharyya
- Venue:
- WAT
- SIG:
- Publisher:
- The COLING 2016 Organizing Committee
- Note:
- Pages:
- 139–148
- Language:
- URL:
- https://aclanthology.org/W16-4613
- DOI:
- Cite (ACL):
- Xiaolin Wang, Andrew Finch, Masao Utiyama, and Eiichiro Sumita. 2016. An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation. In Proceedings of the 3rd Workshop on Asian Translation (WAT2016), pages 139–148, Osaka, Japan. The COLING 2016 Organizing Committee.
- Cite (Informal):
- An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation (Wang et al., WAT 2016)
- PDF:
- https://preview.aclanthology.org/ingest-acl-2023-videos/W16-4613.pdf