ImageTra: Real-Time Translation for Texts in Image and Video
Hour Kaing, Jiannan Mao, Haiyue Song, Chenchen Ding, Hideki Tanaka, Masao Utiyama
Abstract
There has been a growing research interest in in-image machine translation, which involves translating texts in images from one language to another. Recent studies continue to explore pipeline-based systems due to its straightforward construction and the consistent improvement of its underlined components. However, the existing implementation for such pipeline often lack extensibility, composability, and support for real-time translation. Therefore, this work introduces —an open-source toolkit designed to facilitate the development of the pipeline-based system of in-image machine translation. The toolkit integrates state-of-the-art open-source models and tools, and is designed with a focus on modularity and efficiency, making it particularly well-suited for real-time translation. The toolkit is released at https://github.com/hour/imagetra.- Anthology ID:
- 2025.ijcnlp-demo.1
- Volume:
- Proceedings of The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations
- Month:
- December
- Year:
- 2025
- Address:
- Mumbai, India
- Editors:
- Xuebo Liu, Ayu Purwarianti
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 1–8
- Language:
- URL:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-demo.1/
- DOI:
- Cite (ACL):
- Hour Kaing, Jiannan Mao, Haiyue Song, Chenchen Ding, Hideki Tanaka, and Masao Utiyama. 2025. ImageTra: Real-Time Translation for Texts in Image and Video. In Proceedings of The 14th International Joint Conference on Natural Language Processing and The 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations, pages 1–8, Mumbai, India. Association for Computational Linguistics.
- Cite (Informal):
- ImageTra: Real-Time Translation for Texts in Image and Video (Kaing et al., IJCNLP 2025)
- PDF:
- https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.ijcnlp-demo.1.pdf