2025
pdf
bib
abs
A Query-Response Framework for Whole-Page Complex-Layout Document Image Translation with Relevant Regional Concentration
Zhiyang Zhang
|
Yaping Zhang
|
Yupu Liang
|
Zhiyuan Chen
|
Lu Xiang
|
Yang Zhao
|
Yu Zhou
|
Chengqing Zong
Findings of the Association for Computational Linguistics: ACL 2025
Document Image Translation (DIT), which aims at translating documents in images from source language to the target, plays an important role in Document Intelligence. It requires a comprehensive understanding of document multi-modalities and a focused concentration on relevant textual regions during translation. However, most existing methods usually rely on the vanilla encoder-decoder paradigm, severely losing concentration on key regions that are especially crucial for complex-layout document translation. To tackle this issue, in this paper, we propose a new Query-Response DIT framework (QRDIT). QRDIT reformulates the DIT task into a parallel response/translation process of the multiple queries (i.e., relevant source texts), explicitly centralizing its focus toward the most relevant textual regions to ensure translation accuracy. A novel dynamic aggregation mechanism is also designed to enhance the text semantics in query features toward translation. Extensive experiments in four translation directions on three benchmarks demonstrate its state-of-the-art performance, showing significant translation quality improvements toward whole-page complex-layout document images.
pdf
bib
abs
Improving MLLM’s Document Image Machine Translation via Synchronously Self-reviewing Its OCR Proficiency
Yupu Liang
|
Yaping Zhang
|
Zhiyang Zhang
|
Zhiyuan Chen
|
Yang Zhao
|
Lu Xiang
|
Chengqing Zong
|
Yu Zhou
Findings of the Association for Computational Linguistics: ACL 2025
Multimodal Large Language Models (MLLMs) have shown strong performance in document image tasks, especially Optical Character Recognition (OCR). However, they struggle with Document Image Machine Translation (DIMT), which requires handling both cross-modal and cross-lingual challenges. Previous efforts to enhance DIMT capability through Supervised Fine-Tuning (SFT) on the DIMT dataset often result in the forgetting of the model’s existing monolingual abilities, such as OCR. To address these challenges, we introduce a novel fine-tuning paradigm, named Synchronously Self-Reviewing (SSR) its OCR proficiency, inspired by the concept “Bilingual Cognitive Advantage”. Specifically, SSR prompts the model to generate OCR text before producing translation text, which allows the model to leverage its strong monolingual OCR ability while learning to translate text across languages. Comprehensive experiments demonstrate the proposed SSR learning helps mitigate catastrophic forgetting, improving the generalization ability of MLLMs on both OCR and DIMT tasks. The code will be released upon acceptance.
2016
bib
abs
Lifelong Machine Learning for Natural Language Processing
Zhiyuan Chen
|
Bing Liu
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Machine learning (ML) has been successfully used as a prevalent approach to solving numerous NLP problems. However, the classic ML paradigm learns in isolation. That is, given a dataset, an ML algorithm is executed on the dataset to produce a model without using any related or prior knowledge. Although this type of isolated learning is very useful, it also has serious limitations as it does not accumulate knowledge learned in the past and use the knowledge to help future learning, which is the hallmark of human learning and human intelligence. Lifelong machine learning (LML) aims to achieve this capability. Specifically, it aims to design and develop computational learning systems and algorithms that learn as humans do, i.e., retaining the results learned in the past, abstracting knowledge from them, and using the knowledge to help future learning. In this tutorial, we will introduce the existing research of LML and to show that LML is very suitable for NLP tasks and has potential to help NLP make major progresses.
2015
pdf
bib
Lifelong Machine Learning for Topic Modeling and Beyond
Zhiyuan Chen
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop
pdf
bib
Lifelong Learning for Sentiment Classification
Zhiyuan Chen
|
Nianzu Ma
|
Bing Liu
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
2014
pdf
bib
Review Topic Discovery with Phrases using the Pólya Urn Model
Geli Fei
|
Zhiyuan Chen
|
Bing Liu
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
pdf
bib
Aspect Extraction with Automated Prior Knowledge Learning
Zhiyuan Chen
|
Arjun Mukherjee
|
Bing Liu
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2013
pdf
bib
Exploiting Domain Knowledge in Aspect Extraction
Zhiyuan Chen
|
Arjun Mukherjee
|
Bing Liu
|
Meichun Hsu
|
Malu Castellanos
|
Riddhiman Ghosh
Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Identifying Intention Posts in Discussion Forums
Zhiyuan Chen
|
Bing Liu
|
Meichun Hsu
|
Malu Castellanos
|
Riddhiman Ghosh
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies