Bowen Tan


Efficient (Soft) Q-Learning for Text Generation with Limited Good Data
Han Guo | Bowen Tan | Zhengzhong Liu | Eric Xing | Zhiting Hu
Findings of the Association for Computational Linguistics: EMNLP 2022

Maximum likelihood estimation (MLE) is the predominant algorithm for training text generation models. This paradigm relies on direct supervision examples, which is not applicable to many emerging applications, such as generating adversarial attacks or generating prompts to control language models. Reinforcement learning (RL) on the other hand offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. Yet previous RL algorithms for text generation, such as policy gradient (on-policy RL) and Q-learning (off-policy RL), are often notoriously inefficient or unstable to train due to the large sequence space and the sparse reward received only at the end of sequences. In this paper, we introduce a new RL formulation for text generation from the soft Q-learning (SQL) perspective. It enables us to draw from the latest RL advances, such as path consistency learning, to combine the best of on-/off-policy updates, and learn effectively from sparse reward. We apply the approach to a wide range of novel text generation tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation. Experiments show our approach consistently outperforms both task-specialized algorithms and the previous RL methods.


On the Generation of Medical Dialogs for COVID-19
Meng Zhou | Zechen Li | Bowen Tan | Guangtao Zeng | Wenmian Yang | Xuehai He | Zeqian Ju | Subrato Chakravorty | Shu Chen | Xingyi Yang | Yichen Zhang | Qingyang Wu | Zhou Yu | Kun Xu | Eric Xing | Pengtao Xie
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Under the pandemic of COVID-19, people experiencing COVID19-related symptoms have a pressing need to consult doctors. Because of the shortage of medical professionals, many people cannot receive online consultations timely. To address this problem, we aim to develop a medical dialog system that can provide COVID19-related consultations. We collected two dialog datasets – CovidDialog – (in English and Chinese respectively) containing conversations between doctors and patients about COVID-19. While the largest of their kind, these two datasets are still relatively small compared with general-domain dialog datasets. Training complex dialog generation models on small datasets bears high risk of overfitting. To alleviate overfitting, we develop a multi-task learning approach, which regularizes the data-deficient dialog generation task with a masked token prediction task. Experiments on the CovidDialog datasets demonstrate the effectiveness of our approach. We perform both human evaluation and automatic evaluation of dialogs generated by our method. Results show that the generated responses are promising in being doctor-like, relevant to conversation history, clinically informative and correct. The code and the data are available at

Progressive Generation of Long Text with Pretrained Language Models
Bowen Tan | Zichao Yang | Maruan Al-Shedivat | Eric Xing | Zhiting Hu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Large-scale language models (LMs) pretrained on massive corpora of text, such as GPT-2, are powerful open-domain text generators. However, as our systematic examination reveals, it is still challenging for such models to generate coherent long passages of text (e.g., 1000 tokens), especially when the models are fine-tuned to the target domain on a small corpus. Previous planning-then-generation methods also fall short of producing such long text in various domains. To overcome the limitations, we propose a simple but effective method of generating text in a progressive manner, inspired by generating images from low to high resolution. Our method first produces domain-specific content keywords and then progressively refines them into complete passages in multiple stages. The simple design allows our approach to take advantage of pretrained LMs at each stage and effectively adapt to any target domain given only a small set of examples. We conduct a comprehensive empirical study with a broad set of evaluation metrics, and show that our approach significantly improves upon the fine-tuned large LMs and various planning-then-generation methods in terms of quality and sample efficiency. Human evaluation also validates that our model generations are more coherent.

Compression, Transduction, and Creation: A Unified Framework for Evaluating Natural Language Generation
Mingkai Deng | Bowen Tan | Zhengzhong Liu | Eric Xing | Zhiting Hu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Natural language generation (NLG) spans a broad range of tasks, each of which serves for specific objectives and desires different properties of generated text. The complexity makes automatic evaluation of NLG particularly challenging. Previous work has typically focused on a single task and developed individual evaluation metrics based on specific intuitions. In this paper, we propose a unifying perspective based on the nature of information change in NLG tasks, including compression (e.g., summarization), transduction (e.g., text rewriting), and creation (e.g., dialog). _Information alignment_ between input, context, and output text plays a common central role in characterizing the generation. With automatic alignment prediction models, we develop a family of interpretable metrics that are suitable for evaluating key aspects of different NLG tasks, often without need of gold reference data. Experiments show the uniformly designed metrics achieve stronger or comparable correlations with human judgement compared to state-of-the-art metrics in each of diverse tasks, including text summarization, style transfer, and knowledge-grounded dialog.


Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach
Bowen Tan | Lianhui Qin | Eric Xing | Zhiting Hu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Given a document and a target aspect (e.g., a topic of interest), aspect-based abstractive summarization attempts to generate a summary with respect to the aspect. Previous studies usually assume a small pre-defined set of aspects and fall short of summarizing on other diverse topics. In this work, we study summarizing on arbitrary aspects relevant to the document, which significantly expands the application of the task in practice. Due to the lack of supervision data, we develop a new weak supervision construction method and an aspect modeling scheme, both of which integrate rich external knowledge sources such as ConceptNet and Wikipedia. Experiments show our approach achieves performance boosts on summarizing both real and synthetic documents given pre-defined or arbitrary aspects.


Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Zhiting Hu | Haoran Shi | Bowen Tan | Wentao Wang | Zichao Yang | Tiancheng Zhao | Junxian He | Lianhui Qin | Di Wang | Xuezhe Ma | Zhengzhong Liu | Xiaodan Liang | Wanrong Zhu | Devendra Sachan | Eric Xing
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations

We introduce Texar, an open-source toolkit aiming to support the broad set of text generation tasks that transform any inputs into natural language, such as machine translation, summarization, dialog, content manipulation, and so forth. With the design goals of modularity, versatility, and extensibility in mind, Texar extracts common patterns underlying the diverse tasks and methodologies, creates a library of highly reusable modules and functionalities, and allows arbitrary model architectures and algorithmic paradigms. In Texar, model architecture, inference, and learning processes are properly decomposed. Modules at a high concept level can be freely assembled or plugged in/swapped out. Texar is thus particularly suitable for researchers and practitioners to do fast prototyping and experimentation. The versatile toolkit also fosters technique sharing across different text generation tasks. Texar supports both TensorFlow and PyTorch, and is released under Apache License 2.0 at


Texar: A Modularized, Versatile, and Extensible Toolbox for Text Generation
Zhiting Hu | Zichao Yang | Tiancheng Zhao | Haoran Shi | Junxian He | Di Wang | Xuezhe Ma | Zhengzhong Liu | Xiaodan Liang | Lianhui Qin | Devendra Singh Chaplot | Bowen Tan | Xingjiang Yu | Eric Xing
Proceedings of Workshop for NLP Open Source Software (NLP-OSS)

We introduce Texar, an open-source toolkit aiming to support the broad set of text generation tasks. Different from many existing toolkits that are specialized for specific applications (e.g., neural machine translation), Texar is designed to be highly flexible and versatile. This is achieved by abstracting the common patterns underlying the diverse tasks and methodologies, creating a library of highly reusable modules and functionalities, and enabling arbitrary model architectures and various algorithmic paradigms. The features make Texar particularly suitable for technique sharing and generalization across different text generation applications. The toolkit emphasizes heavily on extensibility and modularized system design, so that components can be freely plugged in or swapped out. We conduct extensive experiments and case studies to demonstrate the use and advantage of the toolkit.

Structured Dialogue Policy with Graph Neural Networks
Lu Chen | Bowen Tan | Sishan Long | Kai Yu
Proceedings of the 27th International Conference on Computational Linguistics

Recently, deep reinforcement learning (DRL) has been used for dialogue policy optimization. However, many DRL-based policies are not sample-efficient. Most recent advances focus on improving DRL optimization algorithms to address this issue. Here, we take an alternative route of designing neural network structure that is better suited for DRL-based dialogue management. The proposed structured deep reinforcement learning is based on graph neural networks (GNN), which consists of some sub-networks, each one for a node on a directed graph. The graph is defined according to the domain ontology and each node can be considered as a sub-agent. During decision making, these sub-agents have internal message exchange between neighbors on the graph. We also propose an approach to jointly optimize the graph structure as well as the parameters of GNN. Experiments show that structured DRL significantly outperforms previous state-of-the-art approaches in almost all of the 18 tasks of the PyDial benchmark.