A universal approach to translating numerical and time expressions

Mei Tu, Yu Zhou, Chengqing Zong


Abstract
Although statistical machine translation (SMT) has made great progress since it came into being, the translation of numerical and time expressions is still far from satisfactory. Generally speaking, numbers are likely to be out-of-vocabulary (OOV) words due to their non-exhaustive characteristics even when the size of training data is very large, so it is difficult to obtain accurate translation results for the infinite set of numbers only depending on traditional statistical methods. We propose a language-independent framework to recognize and translate numbers more precisely by using a rule-based method. Through designing operators, we succeed to make rules educible and totally separate from codes, thus, we can extend rules to various language-pairs without re-coding, which contributes a lot to the efficient development of an SMT system with good portability. We classify numbers and time expressions into seven types, which are Arabic number, cardinal numbers, ordinal numbers, date, time of day, day of week and figures. A greedy algorithm is developed to deal with rule conflicts. Experiments have shown that our approach can significantly improve the translation performance.
Anthology ID:
2012.iwslt-papers.9
Volume:
Proceedings of the 9th International Workshop on Spoken Language Translation: Papers
Month:
December 6-7
Year:
2012
Address:
Hong Kong, Table of contents
Venue:
IWSLT
SIG:
SIGSLT
Publisher:
Note:
Pages:
209–216
Language:
URL:
https://aclanthology.org/2012.iwslt-papers.9
DOI:
Bibkey:
Cite (ACL):
Mei Tu, Yu Zhou, and Chengqing Zong. 2012. A universal approach to translating numerical and time expressions. In Proceedings of the 9th International Workshop on Spoken Language Translation: Papers, pages 209–216, Hong Kong, Table of contents.
Cite (Informal):
A universal approach to translating numerical and time expressions (Tu et al., IWSLT 2012)
Copy Citation:
PDF:
https://preview.aclanthology.org/emnlp-22-attachments/2012.iwslt-papers.9.pdf