Thot: a Toolkit To Train Phrase-based Statistical Translation Models
Daniel Ortiz-Martínez, Ismael García-Varea, Francisco Casacuberta
Abstract
In this paper, we present the Thot toolkit, a set of tools to train phrase-based models for statistical machine translation, which is publicly available as open source software. The toolkit obtains phrase-based models from word-based alignment models; to our knowledge, this functionality has not been offered by any publicly available toolkit. The Thot toolkit also implements a new way for estimating phrase models, this allows to obtain more complete phrase models than the methods described in the literature, including a segmentation length submodel. The toolkit output can be given in different formats in order to be used by other statistical machine translation tools like Pharaoh, which is a beam search decoder for phrase-based alignment models which was used in order to perform translation experiments with the generated models. Additionally, the Thot toolkit can be used to obtain the best alignment between a sentence pair at phrase level.- Anthology ID:
- 2005.mtsummit-papers.19
- Volume:
- Proceedings of Machine Translation Summit X: Papers
- Month:
- September 13-15
- Year:
- 2005
- Address:
- Phuket, Thailand
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- 141–148
- Language:
- URL:
- https://aclanthology.org/2005.mtsummit-papers.19
- DOI:
- Cite (ACL):
- Daniel Ortiz-Martínez, Ismael García-Varea, and Francisco Casacuberta. 2005. Thot: a Toolkit To Train Phrase-based Statistical Translation Models. In Proceedings of Machine Translation Summit X: Papers, pages 141–148, Phuket, Thailand.
- Cite (Informal):
- Thot: a Toolkit To Train Phrase-based Statistical Translation Models (Ortiz-Martínez et al., MTSummit 2005)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2005.mtsummit-papers.19.pdf