Dong Wang


2025

pdf bib
TALON: A Multi-Agent Framework for Long-Table Exploration and Question Answering
Ruochun Jin | Xiyue Wang | Dong Wang | Haoqi Zheng | Yunpeng Qi | Silin Yang | Meng Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Table question answering (TQA) requires accurate retrieval and reasoning over tabular data. Existing approaches attempt to retrieve query-relevant content before leveraging large language models (LLMs) to reason over long tables. However, these methods often fail to accurately retrieve contextually relevant data which results in information loss, and suffer from excessive encoding overhead. In this paper, we propose TALON, a multi-agent framework designed for question answering over long tables. TALON features a planning agent that iteratively invokes a tool agent to access and manipulate tabular data based on intermediate feedback, which progressively collects necessary information for answer generation, while a critic agent ensures accuracy and efficiency in tool usage and planning. In order to comprehensively assess the effectiveness of TALON, we introduce two benchmarks derived from the WikiTableQuestion and BIRD-SQL datasets, which contain tables ranging from 50 to over 10,000 rows. Experiments demonstrate that TALON achieves average accuracy improvements of 7.5% and 12.0% across all language models, establishing a new state-of-the-art in long-table question answering. Our code is publicly available at: https://github.com/Wwestmoon/TALON.

pdf bib
Logical DA: Enhancing Data Augmentation for Logical Reasoning via a Multi-Agent System
Haoqi Zheng | Dong Wang | Silin Yang | Yunpeng Qi | Ruochun Jin | Liyang Xu
Findings of the Association for Computational Linguistics: ACL 2025

Recent advancements in large language models (LLMs) have highlighted the importance of improving their reasoning capabilities. A critical challenge lies in the scarcity of high-quality reasoning data—characterized by diversity and rich supervisory signals—necessary for robust model training. While data augmentation (DA) methods have been leveraged to mitigate this scarcity, prevailing approaches often introduce noise and exhibit logical inconsistencies, thereby diminishing their utility for complex reasoning tasks. Moreover, existing DA paradigms predominantly isolate data synthesis from label validation, failing to unify these complementary processes within a cohesive architecture.To address these limitations, we introduce Logical DA, a multi-agent framework for enhancing reasoning-focused data augmentation in few-shot learning scenarios. Our system includes four agents operating through two synergistic phases: (1) diverse data generation, and (2) label verification.The system incorporates a reflection mechanism to continuously improve data quality by leveraging feedback from logical validation. We demonstrate the effectiveness of Logical DA through experiments on various tasks and datasets, achieving the highest average improvement in task accuracy in both fine-tuning and in-context learning paradigms, with an average improvement of 7.61% when applied to fine-tuning.

pdf bib
Zero-Shot Cross-Domain Aspect-Based Sentiment Analysis via Domain-Contextualized Chain-of-Thought Reasoning
Chuming Shen | Wei Wei | Dong Wang | Zhong-Hao Wang
Findings of the Association for Computational Linguistics: EMNLP 2025

Cross-domain aspect-based sentiment analysis (ABSA) aims at learning specific knowledge from a source domain to perform various ABSA tasks on a target domain. Recent works mainly focus on how to use domain adaptation techniques to transfer the domain-agnostic features from the labeled source domain to the unlabeled target domain. However, it would be unwise to manually collect a large number of unlabeled data from the target domain, where such data may not be available owing to the facts like data security concerns in banking or insurance. To alleviate this issue, we propose ZeroABSA, a unified zero-shot learning framework for cross-domain ABSA that effectively eliminates dependency on target-domain annotations. Specifically, ZeroABSA consists of two novel components, namely, (1) A hybrid data augmentation module leverages large language models (LLMs) to synthesize high-quality, domain-adaptive target-domain data, by evaluating the generated samples across vocabulary richness, semantic coherence and sentiment/domain consistency, followed by iterative refinement; (2) A domain-aware chain-of-thought (COT) prompting strategy trains models on augmented data while explicitly modeling domain-invariant reasoning to bridge the well-known cross-domain gap. Extensive evaluations across four diverse domains demonstrate that ZeroABSA surpasses the-state-of-the-arts, which effectively advances the practicality of cross-domain ABSA in real-world scenarios where labeled target-domain data is unavailable.

2024

pdf bib
Retrieval Augmented Fact Verification by Synthesizing Contrastive Arguments
Zhenrui Yue | Huimin Zeng | Lanyu Shang | Yifan Liu | Yang Zhang | Dong Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The rapid propagation of misinformation poses substantial risks to public interest. To combat misinformation, large language models (LLMs) are adapted to automatically verify claim credibility. Nevertheless, existing methods heavily rely on the embedded knowledge within LLMs and / or black-box APIs for evidence collection, leading to subpar performance with smaller LLMs or upon unreliable context. In this paper, we propose retrieval augmented fact verification through the synthesis of contrasting arguments (RAFTS). Upon input claims, RAFTS starts with evidence retrieval, where we design a retrieval pipeline to collect and re-rank relevant documents from verifiable sources. Then, RAFTS forms contrastive arguments (i.e., supporting or refuting) conditioned on the retrieved evidence. In addition, RAFTS leverages an embedding model to identify informative demonstrations, followed by in-context prompting to generate the prediction and explanation. Our method effectively retrieves relevant documents as evidence and evaluates arguments from varying perspectives, incorporating nuanced information for fine-grained decision-making. Combined with informative in-context examples as prior, RAFTS achieves significant improvements to supervised and LLM baselines without complex prompts. We demonstrate the effectiveness of our method through extensive experiments, where RAFTS can outperform GPT-based methods with a significantly smaller 7B LLM.

pdf bib
Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation
Zhenrui Yue | Huimin Zeng | Yimeng Lu | Lanyu Shang | Yang Zhang | Dong Wang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The proliferation of online misinformation has posed significant threats to public interest. While numerous online users actively participate in the combat against misinformation, many of such responses can be characterized by the lack of politeness and supporting facts. As a solution, text generation approaches are proposed to automatically produce counter-misinformation responses. Nevertheless, existing methods are often trained end-to-end without leveraging external knowledge, resulting in subpar text quality and excessively repetitive responses. In this paper, we propose retrieval augmented response generation for online misinformation (RARG), which collects supporting evidence from scientific sources and generates counter-misinformation responses based on the evidences. In particular, our RARG consists of two stages: (1) evidence collection, where we design a retrieval pipeline to retrieve and rerank evidence documents using a database comprising over 1M academic articles; (2) response generation, in which we align large language models (LLMs) to generate evidence-based responses via reinforcement learning from human feedback (RLHF). We propose a reward function to maximize the utilization of the retrieved evidence while maintaining the quality of the generated text, which yields polite and factual responses that clearly refutes misinformation. To demonstrate the effectiveness of our method, we study the case of COVID-19 and perform extensive experiments with both in- and cross-domain datasets, where RARG consistently outperforms baselines by generating high-quality counter-misinformation responses.

pdf bib
Open-Vocabulary Federated Learning with Multimodal Prototyping
Huimin Zeng | Zhenrui Yue | Dong Wang
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Existing federated learning (FL) studies usuallyassume the training label space and test labelspace are identical. However, in real-world applications, this assumption is too ideal to betrue. A new user could come up with queriesthat involve data from unseen classes, and suchopen-vocabulary queries would directly defectsuch FL systems. Therefore, in this work, weexplicitly focus on the under-explored openvocabulary challenge in FL. That is, for a newuser, the global server shall understand her/hisquery that involves arbitrary unknown classes.To address this problem, we leverage the pretrained vision-language models (VLMs). Inparticular, we present a novel adaptation framework tailored for VLMs in the context of FL,named as Federated Multimodal Prototyping(Fed-MP). Fed-MP adaptively aggregates thelocal model weights based on light-weightclient residuals, and makes predictions basedon a novel multimodal prototyping mechanism.Fed-MP exploits the knowledge learned fromthe seen classes, and robustifies the adaptedVLM to unseen categories. Our empirical evaluation on various datasets validates the effectiveness of Fed-MP.

2023

pdf bib
KEPL: Knowledge Enhanced Prompt Learning for Chinese Hypernym-Hyponym Extraction
Ningchen Ma | Dong Wang | Hongyun Bao | Lei He | Suncong Zheng
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Modeling hypernym-hyponym (“is-a”) relations is very important for many natural language processing (NLP) tasks, such as classification, natural language inference and relation extraction. Existing work on is-a relation extraction is mostly in the English language environment. Due to the flexibility of language expression and the lack of high-quality Chinese annotation datasets, it is still a challenge to accurately identify such relations from Chinese unstructured texts. To tackle this problem, we propose a Knowledge Enhanced Prompt Learning (KEPL) method for Chinese hypernym-hyponym relation extraction. Our model uses the Hearst-like patterns as the prior knowledge. By exploiting a Dynamic Adaptor Architecture to select the matching pattern for the text into prompt, our model embeds patterns and text simultaneously. Additionally, we construct a Chinese hypernym-hyponym relation extraction dataset, which contains three typical scenarios, as baike, news and We-media. The experimental results on the dataset demonstrate the efficiency and effectiveness of our proposed model.

2022

pdf bib
Domain Adaptation for Question Answering via Question Classification
Zhenrui Yue | Huimin Zeng | Ziyi Kou | Lanyu Shang | Dong Wang
Proceedings of the 29th International Conference on Computational Linguistics

Question answering (QA) has demonstrated impressive progress in answering questions from customized domains. Nevertheless, domain adaptation remains one of the most elusive challenges for QA systems, especially when QA systems are trained in a source domain but deployed in a different target domain. In this work, we investigate the potential benefits of question classification for QA domain adaptation. We propose a novel framework: Question Classification for Question Answering (QC4QA). Specifically, a question classifier is adopted to assign question classes to both the source and target data. Then, we perform joint training in a self-supervised fashion via pseudo-labeling. For optimization, inter-domain discrepancy between the source and target domain is reduced via maximum mean discrepancy (MMD) distance. We additionally minimize intra-class discrepancy among QA samples of the same question class for fine-grained adaptation performance. To the best of our knowledge, this is the first work in QA domain adaptation to leverage question classification with self-supervised adaptation. We demonstrate the effectiveness of the proposed QC4QA with consistent improvements against the state-of-the-art baselines on multiple datasets.

pdf bib
QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive Adaptation
Zhenrui Yue | Huimin Zeng | Bernhard Kratzwald | Stefan Feuerriegel | Dong Wang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Question answering (QA) has recently shown impressive results for answering questions from customized domains. Yet, a common challenge is to adapt QA models to an unseen target domain. In this paper, we propose a novel self-supervised framework called QADA for QA domain adaptation. QADA introduces a novel data augmentation pipeline used to augment training QA samples. Different from existing methods, we enrich the samples via hidden space augmentation. For questions, we introduce multi-hop synonyms and sample augmented token embeddings with Dirichlet distributions. For contexts, we develop an augmentation method which learns to drop context spans via a custom attentive sampling strategy. Additionally, contrastive learning is integrated in the proposed self-supervised adaptation framework QADA. Unlike existing approaches, we generate pseudo labels and propose to train the model via a novel attention-based contrastive adaptation method. The attention weights are used to build informative features for discrepancy estimation that helps the QA model separate answers and generalize across source and target domains. To the best of our knowledge, our work is the first to leverage hidden space augmentation and attention-based contrastive adaptation for self-supervised domain adaptation in QA. Our evaluation shows that QADA achieves considerable improvements on multiple target datasets over state-of-the-art baselines in QA domain adaptation.

2021

pdf bib
CLINE: Contrastive Learning with Semantic Negative Examples for Natural Language Understanding
Dong Wang | Ning Ding | Piji Li | Haitao Zheng
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Despite pre-trained language models have proven useful for learning high-quality semantic representations, these models are still vulnerable to simple perturbations. Recent works aimed to improve the robustness of pre-trained models mainly focus on adversarial training from perturbed examples with similar semantics, neglecting the utilization of different or even opposite semantics. Different from the image processing field, the text is discrete and few word substitutions can cause significant semantic changes. To study the impact of semantics caused by small perturbations, we conduct a series of pilot experiments and surprisingly find that adversarial training is useless or even harmful for the model to detect these semantic changes. To address this problem, we propose Contrastive Learning with semantIc Negative Examples (CLINE), which constructs semantic negative examples unsupervised to improve the robustness under semantically adversarial attacking. By comparing with similar and opposite semantic examples, the model can effectively perceive the semantic changes caused by small perturbations. Empirical results show that our approach yields substantial improvements on a range of sentiment analysis, reasoning, and reading comprehension tasks. And CLINE also ensures the compactness within the same semantics and separability across different semantics in sentence-level.

2020

pdf bib
Conversational Word Embedding for Retrieval-Based Dialog System
Wentao Ma | Yiming Cui | Ting Liu | Dong Wang | Shijin Wang | Guoping Hu
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Human conversations contain many types of information, e.g., knowledge, common sense, and language habits. In this paper, we propose a conversational word embedding method named PR-Embedding, which utilizes the conversation pairs <post, reply> to learn word embedding. Different from previous works, PR-Embedding uses the vectors from two different semantic spaces to represent the words in post and reply. To catch the information among the pair, we first introduce the word alignment model from statistical machine translation to generate the cross-sentence window, then train the embedding on word-level and sentence-level. We evaluate the method on single-turn and multi-turn response selection tasks for retrieval-based dialog systems. The experiment results show that PR-Embedding can improve the quality of the selected response.

pdf bib
Summarize before Aggregate: A Global-to-local Heterogeneous Graph Inference Network for Conversational Emotion Recognition
Dongming Sheng | Dong Wang | Ying Shen | Haitao Zheng | Haozhuang Liu
Proceedings of the 28th International Conference on Computational Linguistics

Conversational Emotion Recognition (CER) is a crucial task in Natural Language Processing (NLP) with wide applications. Prior works in CER generally focus on modeling emotion influences solely with utterance-level features, with little attention paid on phrase-level semantic connection between utterances. Phrases carry sentiments when they are referred to emotional events under certain topics, providing a global semantic connection between utterances throughout the entire conversation. In this work, we propose a two-stage Summarization and Aggregation Graph Inference Network (SumAggGIN), which seamlessly integrates inference for topic-related emotional phrases and local dependency reasoning over neighbouring utterances in a global-to-local fashion. Topic-related emotional phrases, which constitutes the global topic-related emotional connections, are recognized by our proposed heterogeneous Summarization Graph. Local dependencies, which captures short-term emotional effects between neighbouring utterances, are further injected via an Aggregation Graph to distinguish the subtle differences between utterances containing emotional phrases. The two steps of graph inference are tightly-coupled for a comprehensively understanding of emotional fluctuation. Experimental results on three CER benchmark datasets verify the effectiveness of our proposed model, which outperforms the state-of-the-art approaches.

pdf bib
Integrating User History into Heterogeneous Graph for Dialogue Act Recognition
Dong Wang | Ziran Li | Haitao Zheng | Ying Shen
Proceedings of the 28th International Conference on Computational Linguistics

Dialogue Act Recognition (DAR) is a challenging problem in Natural Language Understanding, which aims to attach Dialogue Act (DA) labels to each utterance in a conversation. However, previous studies cannot fully recognize the specific expressions given by users due to the informality and diversity of natural language expressions. To solve this problem, we propose a Heterogeneous User History (HUH) graph convolution network, which utilizes the user’s historical answers grouped by DA labels as additional clues to recognize the DA label of utterances. To handle the noise caused by introducing the user’s historical answers, we design sets of denoising mechanisms, including a History Selection process, a Similarity Re-weighting process, and an Edge Re-weighting process. We evaluate the proposed method on two benchmark datasets MSDialog and MRDA. The experimental results verify the effectiveness of integrating user’s historical answers, and show that our proposed model outperforms the state-of-the-art methods.

2017

pdf bib
Memory-augmented Neural Machine Translation
Yang Feng | Shiyue Zhang | Andi Zhang | Dong Wang | Andrew Abel
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

Neural machine translation (NMT) has achieved notable success in recent times, however it is also widely recognized that this approach has limitations with handling infrequent words and word pairs. This paper presents a novel memory-augmented NMT (M-NMT) architecture, which stores knowledge about how words (usually infrequently encountered ones) should be translated in a memory and then utilizes them to assist the neural model. We use this memory mechanism to combine the knowledge learned from a conventional statistical machine translation system and the rules learned by an NMT system, and also propose a solution for out-of-vocabulary (OOV) words based on this framework. Our experiments on two Chinese-English translation tasks demonstrated that the M-NMT architecture outperformed the NMT baseline by 9.0 and 2.7 BLEU points on the two tasks, respectively. Additionally, we found this architecture resulted in a much more effective OOV treatment compared to competitive methods.

pdf bib
Discourse Mode Identification in Essays
Wei Song | Dong Wang | Ruiji Fu | Lizhen Liu | Ting Liu | Guoping Hu
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Discourse modes play an important role in writing composition and evaluation. This paper presents a study on the manual and automatic identification of narration,exposition, description, argument and emotion expressing sentences in narrative essays. We annotate a corpus to study the characteristics of discourse modes and describe a neural sequence labeling model for identification. Evaluation results show that discourse modes can be identified automatically with an average F1-score of 0.7. We further demonstrate that discourse modes can be used as features that improve automatic essay scoring (AES). The impacts of discourse modes for AES are also discussed.

pdf bib
Flexible and Creative Chinese Poetry Generation Using Neural Memory
Jiyuan Zhang | Yang Feng | Dong Wang | Yang Wang | Andrew Abel | Shiyue Zhang | Andi Zhang
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

It has been shown that Chinese poems can be successfully generated by sequence-to-sequence neural models, particularly with the attention mechanism. A potential problem of this approach, however, is that neural models can only learn abstract rules, while poem generation is a highly creative process that involves not only rules but also innovations for which pure statistical models are not appropriate in principle. This work proposes a memory augmented neural model for Chinese poem generation, where the neural model and the augmented memory work together to balance the requirements of linguistic accordance and aesthetic innovation, leading to innovative generations that are still rule-compliant. In addition, it is found that the memory mechanism provides interesting flexibility that can be used to generate poems with different styles.

2015

pdf bib
Stochastic Top-k ListNet
Tianyi Luo | Dong Wang | Rong Liu | Yiqiao Pan
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation
Chao Xing | Dong Wang | Chao Liu | Yiye Lin
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf bib
Joint Semantic Relevance Learning with Text Data and Graph Knowledge
Dongxu Zhang | Bin Yuan | Dong Wang | Rong Liu
Proceedings of the 3rd Workshop on Continuous Vector Space Models and their Compositionality

2012

pdf bib
Tweet Ranking Based on Heterogeneous Networks
Hongzhao Huang | Arkaitz Zubiaga | Heng Ji | Hongbo Deng | Dong Wang | Hieu Le | Tarek Abdelzaher | Jiawei Han | Alice Leung | John Hancock | Clare Voss
Proceedings of COLING 2012

pdf bib
Effort of Genre Variation and Prediction of System Performance
Dong Wang | Fei Xia
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Domain adaptation is an important task in order for NLP systems to work well in real applications. There has been extensive research on this topic. In this paper, we address two issues that are related to domain adaptation. The first question is how much genre variation will affect NLP systems' performance. We investigate the effect of genre variation on the performance of three NLP tools, namely, word segmenter, POS tagger, and parser. We choose the Chinese Penn Treebank (CTB) as our corpus. The second question is how one can estimate NLP systems' performance when gold standard on the test data does not exist. To answer the question, we extend the prediction model in (Ravi et al., 2008) to provide prediction for word segmentation and POS tagging as well. Our experiments show that the predicted scores are close to the real scores when tested on the CTB data.

pdf bib
A Two-step Approach to Sentence Compression of Spoken Utterances
Dong Wang | Xian Qian | Yang Liu
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2011

pdf bib
A Pilot Study of Opinion Summarization in Conversations
Dong Wang | Yang Liu
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
A Cross-corpus Study of Unsupervised Subjectivity Identification based on Calibrated EM
Dong Wang | Yang Liu
Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA 2.011)

2010

pdf bib
Improving Blog Polarity Classification via Topic Analysis and Adaptive Methods
Feifan Liu | Dong Wang | Bin Li | Yang Liu
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics