2025
pdf
bib
abs
Enhancing Retrieval Systems with Inference-Time Logical Reasoning
Felix Faltings
|
Wei Wei
|
Yujia Bao
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Traditional retrieval methods rely on transforming user queries into vector representations and retrieving documents based on cosine similarity within an embedding space. While efficient and scalable, this approach often fails to handle complex queries involving logical constructs such as negations, conjunctions, and disjunctions. In this paper, we propose a novel inference-time logical reasoning framework that explicitly incorporates logical reasoning into the retrieval process. Our method extracts logical reasoning structures from natural language queries and then composes the individual cosine similarity matching scores to formulate the final document scores. This approach enables the retrieval process to handle complex logical reasoning without compromising computational efficiency. Our results on both synthetic and real-world benchmarks demonstrate that the proposed method consistently outperforms traditional retrieval methods across different models and datasets, significantly improving retrieval performance for complex queries.
pdf
bib
abs
Enhancing Retrieval for ESGLLM via ESG-CID: A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS
Shafiuddin Rehan Ahmed
|
Ankit Shah
|
Quan Hung Tran
|
Vivek Khetan
|
Sukryool Kang
|
Ankit Mehta
|
Yujia Bao
|
Wei Wei
Proceedings of the 2nd Workshop on Natural Language Processing Meets Climate Change (ClimateNLP 2025)
Climate change has intensified the need for transparency and accountability in organizational practices, making Environmental, Social, and Governance (ESG) reporting increasingly crucial. Frameworks like the Global Reporting Initiative (GRI) and the new European Sustainability Reporting Standards (ESRS) aim to standardize ESG reporting, yet generating comprehensive reports remains challenging due to the considerable length of ESG documents and variability in company reporting styles. To facilitate ESG report automation, Retrieval-Augmented Generation (RAG) systems can be employed, but their development is hindered by a lack of labeled data suitable for training retrieval models. In this paper, we leverage an underutilized source of weak supervision—the disclosure content index found in past ESG reports—to create a comprehensive dataset, ESG-CID, for both GRI and ESRS standards. By extracting mappings between specific disclosure requirements and corresponding report sections, and refining them using a Large Language Model as a judge, we generate a robust training and evaluation set. We benchmark popular embedding models on this dataset and show that fine-tuning BERT-based models can outperform commercial embeddings and leading public models, even under temporal data splits for cross-report style transfer from GRI to ESRS.
pdf
bib
abs
Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning
Xiaoye Qu
|
Jiashuo Sun
|
Wei Wei
|
Daizong Liu
|
Jianfeng Dong
|
Yu Cheng
Proceedings of the 31st International Conference on Computational Linguistics
Recently, Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in multi-modal context comprehension. However, they still suffer from hallucination problems referring to generating inconsistent outputs with the image content. To mitigate hallucinations, previous studies mainly focus on retraining LVLMs with custom datasets. Although effective, they inherently come with additional computational costs. In this paper, we propose a training-free framework, MVP, that aims to reduce hallucinations by making the most of the innate capabilities of the LVLMs via Multi-View Multi-Path Reasoning. Specifically, we first devise a multi-view information-seeking strategy to thoroughly perceive the comprehensive information in the image, which enriches the general global information captured by the original vision encoder in LVLMs. Furthermore, during the answer decoding, we propose multi-path reasoning for each information view to quantify and aggregate the certainty scores for each potential answer among multiple decoding paths and finally decide the output answer. By fully grasping the information in the image and carefully considering the certainty of the potential answers when decoding, our MVP can effectively reduce hallucinations in LVLMs. The extensive experiments verify that our proposed MVP significantly mitigates the hallucination problem across four well-known LVLMs.
pdf
bib
abs
Modal Feature Optimization Network with Prompt for Multimodal Sentiment Analysis
Xiangmin Zhang
|
Wei Wei
|
Shihao Zou
Proceedings of the 31st International Conference on Computational Linguistics
Multimodal sentiment analysis(MSA) is mostly used to understand human emotional states through multimodal. However, due to the fact that the effective information carried by multimodal is not balanced, the modality containing less effective information cannot fully play the complementary role between modalities. Therefore, the goal of this paper is to fully explore the effective information in modalities and further optimize the under-optimized modal representation.To this end, we propose a novel Modal Feature Optimization Network (MFON) with a Modal Prompt Attention (MPA) mechanism for MSA. Specifically, we first determine which modalities are under-optimized in MSA, and then use relevant prompt information to focus the model on these features. This allows the model to focus more on the features of the modalities that need optimization, improving the utilization of each modality’s feature representation and facilitating initial information aggregation across modalities. Subsequently, we design an intra-modal knowledge distillation strategy for under-optimized modalities. This approach preserves the integrity of the modal features. Furthermore, we implement inter-modal contrastive learning to better extract related features across modalities, thereby optimizing the entire network. Finally, sentiment prediction is carried out through the effective fusion of multimodal information. Extensive experimental results on public benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art models.
pdf
bib
abs
EvoPrompt: Evolving Prompts for Enhanced Zero-Shot Named Entity Recognition with Large Language Models
Zeliang Tong
|
Zhuojun Ding
|
Wei Wei
Proceedings of the 31st International Conference on Computational Linguistics
Large language models (LLMs) possess extensive prior knowledge and powerful in-context learning (ICL) capabilities, presenting significant opportunities for low-resource tasks. Though effective, several key issues still have not been well-addressed when focusing on zero-shot named entity recognition (NER), including the misalignment between model and human definitions of entity types, and confusion of similar types. This paper proposes an Evolving Prompts framework that guides the model to better address these issues through continuous prompt refinement. Specifically, we leverage the model to summarize the definition of each entity type and the distinctions between similar types (i.e., entity type guidelines). An iterative process is introduced to continually adjust and improve these guidelines. Additionally, since high-quality demonstrations are crucial for effective learning yet challenging to obtain in zero-shot scenarios, we design a strategy motivated by self-consistency and prototype learning to extract reliable and diverse pseudo samples from the model’s predictions. Experiments on four benchmarks demonstrate the effectiveness of our framework, showing consistent performance improvements.
pdf
bib
abs
CoMIF: Modeling of Complex Multiple Interaction Factors for Conversation Generation
Yuxuan Chen
|
Wei Wei
|
Shixuan Fan
|
Kaihe Xu
|
Dangyang Chen
Proceedings of the 31st International Conference on Computational Linguistics
Highly realistic human-machine interaction is challenging for open-domain dialogue systems. Although existing methods have achieved notable progress by leveraging various interaction factors (e.g., emotion, personality, topic) for delivering human-like (e.g., empathetic, personalized and semantically-consistent) responses, they typically model such factor alone and thus easily suffer from low-quality response generation issue. We attribute this limitation to the neglect of implicit-correlations among factors. Furthermore, different factors may alternately dominate token-level response generation during decoding, making it harder to generate high-quality responses by applying various factors at the sentence level. To address the issue, we present a unified response generation framework, which is capable of simultaneously modeling Complex Multiple Interaction Factors (named CoMIF) to generate human-like conversations. To model the implicit correlations among factors, CoMIF first employ a dynamic perception module to construct a directed collaborative-graph to jointly learn the dynamics over time of each factor, as well as the cross-dependencies among them. Additionally, we also design a scalable post-adaptation module to introduce token-level factor signals to generate more human-like responses with appropriately multiple factors. Extensive experiments over multiple datasets demonstrate that the proposed method achieves the superior performance in generating more human-like responses with appropriate multiple-factors, as compared to the state-of-the-art methods.
pdf
bib
abs
SA-DETR:Span Aware Detection Transformer for Moment Retrieval
Tianheng Xiong
|
Wei Wei
|
Kaihe Xu
|
Dangyang Chen
Proceedings of the 31st International Conference on Computational Linguistics
Moment Retrieval aims to locate specific video segments related to the given text. Recently, DETR-based methods, originating from Object Detection, have emerged as effective solutions for Moment Retrieval. These approaches focus on multimodal feature fusion and refining Queries composed of span anchor and content embedding. Despite the success, they often overlook the video-text instance related information in Query Initialization and the crucial guidance role of span anchors in Query Refinement, leading to inaccurate predictions. To address this, we propose a novel Span Aware DEtection TRansformer (SA-DETR) that leverages the importance of instance related span anchors. To fully leverage the instance related information, we generate span anchors based on video-text pair rather than using learnable parameters, as is common in conventional DETR-based methods, and supervise them with GT labels. To effectively exploit the correspondence between span anchors and video clips, we enhance content embedding guided by textual features and generate Gaussian mask to modulate the interaction between content embedding and fusion features. Furthermore, we explore the feature alignment across various stages and granularities and apply denoise learning to boost the span awareness of the model. Extensive experiments on QVHighlights, Charades-STA, and TACoS demonstrate the effectiveness of our approach.
pdf
bib
abs
MMSciBench: Benchmarking Language Models on Chinese Multimodal Scientific Problems
Xinwu Ye
|
Chengfan Li
|
Siming Chen
|
Wei Wei
|
Robert Tang
Findings of the Association for Computational Linguistics: ACL 2025
Recent advances in large language models (LLMs) and vision-language models (LVLMs) have shown promise across many tasks, yet their scientific reasoning capabilities remain untested, particularly in multimodal settings. We present MMSciBench, a benchmark for evaluating mathematical and physical reasoning through text-only and text-image formats, with human-annotated difficulty levels, solutions with detailed explanations, and taxonomic mappings. Evaluation of state-of-the-art models reveals significant limitations, with even the best model achieving only 63.77% accuracy and particularly struggling with visual reasoning tasks. Our analysis exposes critical gaps in complex reasoning and visual-textual integration, establishing MMSciBench as a rigorous standard for measuring progress in multimodal scientific understanding. The code for MMSciBench is open-sourced at GitHub, and the dataset is available at Hugging Face.
2024
pdf
bib
abs
Confidence is not Timeless: Modeling Temporal Validity for Rule-based Temporal Knowledge Graph Forecasting
Rikui Huang
|
Wei Wei
|
Xiaoye Qu
|
Shengzhe Zhang
|
Dangyang Chen
|
Yu Cheng
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recently, Temporal Knowledge Graph Forecasting (TKGF) has emerged as a pivotal domain for forecasting future events. Unlike black-box neural network methods, rule-based approaches are lauded for their efficiency and interpretability. For this line of work, it is crucial to correctly estimate the predictive effectiveness of the rules, i.e., the confidence. However, the existing literature lacks in-depth investigation into how confidence evolves with time. Moreover, inaccurate and heuristic confidence estimation limits the performance of rule-based methods. To alleviate such issues, we propose a framework named TempValid to explicitly model the temporal validity of rules for TKGF. Specifically, we design a time function to model the interaction between temporal information with confidence. TempValid conceptualizes confidence and other coefficients as learnable parameters to avoid inaccurate estimation and combinatorial explosion. Furthermore, we introduce a rule-adversarial negative sampling and a time-aware negative sampling strategies to facilitate TempValid learning. Extensive experiments show that TempValid significantly outperforms previous state-of-the-art (SOTA) rule-based methods on six TKGF datasets. Moreover, it exhibits substantial advancements in cross-domain and resource-constrained rule learning scenarios.
pdf
bib
abs
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
Wendi Li
|
Wei Wei
|
Kaihe Xu
|
Wenfeng Xie
|
Dangyang Chen
|
Yu Cheng
Findings of the Association for Computational Linguistics: NAACL 2024
To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally guided by coarse-grained (sentence/paragraph-level) feedback, which may lead to suboptimal performance owing to semantic twists or progressions within sentences. To tackle that, we propose a novel reinforcement learning algorithm named TOLE which formulates TOken-LEvel rewards for controllable text generation, and employs a “first-quantize-then-noise” paradigm to enhance the robustness of the RL algorithm. Furthermore, TOLE can be flexibly extended to multiple constraints with little computational expense. Experimental results show that our algorithm can achieve superior performance on both single-attribute and multi-attribute control tasks. We have released our codes at https://github.com/WindyLee0822/CTG.
2023
pdf
bib
abs
DocumentNet: Bridging the Data Gap in Document Pre-training
Lijun Yu
|
Jin Miao
|
Xiaoyu Sun
|
Jiayi Chen
|
Alexander Hauptmann
|
Hanjun Dai
|
Wei Wei
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI. However, publicly available data have been scarce for these tasks due to strict privacy constraints and high annotation costs. To make things worse, the non-overlapping entity spaces from different datasets hinder the knowledge transfer between document types. In this paper, we propose a method to collect massive-scale and weakly labeled data from the web to benefit the training of VDER models. The collected dataset, named DocumentNet, does not depend on specific document types or entity sets, making it universally applicable to all VDER tasks. The current DocumentNet consists of 30M documents spanning nearly 400 document types organized in a four-level ontology. Experiments on a set of broadly adopted VDER tasks show significant improvements when DocumentNet is incorporated into the pre-training for both classic and few-shot learning settings. With the recent emergence of large language models (LLMs), DocumentNet provides a large data source to extend their multimodal capabilities for VDER.
pdf
bib
abs
Miracle: Towards Personalized Dialogue Generation with Latent-Space Multiple Personal Attribute Control
Zhenyi Lu
|
Wei Wei
|
Xiaoye Qu
|
Xian-Ling Mao
|
Dangyang Chen
|
Jixiong Chen
Findings of the Association for Computational Linguistics: EMNLP 2023
Personalized dialogue systems aim to endow the chatbot agent with more anthropomorphic traits for human-like interactions. Previous approaches have explored explicitly user profile modeling using text descriptions, implicit derivation of user embeddings, or utilizing handicraft prompts for ChatGPT-like models. However, textual personas are limited in describing multi-faceted attributes (
e.g.,
language style, inner character nuances), implicit embedding suffers from personality sparsity, and handicraft prompts lack fine-grained and stable controllability. Hence, these approaches may struggle with complex personalized dialogue generation tasks that require generating controllable responses with multiple personal attributes. To this end, we propose
Miracle, a novel personalized dialogue generation method through
Mult
Iple Pe
Rsonal
Attributes
Control within
Latent-Space
Energy-based Models. ttributes
Control within
Latent-Space
Energy-based Models. Specifically, our approach first disentangles complex personality into multi-faceted attributes. Subsequently, we employ a conditional variational auto-encoder to align with the dense personalized responses within a latent joint attribute space. We have also tailored a dedicated energy function and customized the ordinary differential equations sampling method to offer flexible attribute composition and precise attribute control. Extensive experiments demonstrate that Miracle outperforms several strong baselines in terms of personality controllability and response generation quality. Our dataset and code are available at
https://github.com/LZY-the-boys/MIRACLEpdf
bib
abs
On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval
Jiayi Chen
|
Hanjun Dai
|
Bo Dai
|
Aidong Zhang
|
Wei Wei
Findings of the Association for Computational Linguistics: EMNLP 2023
Visually-rich document entity retrieval (VDER), which extracts key information (e.g. date, address) from document images like invoices and receipts, has become an important topic in industrial NLP applications. The emergence of new document types at a constant pace, each with its unique entity types, presents a unique challenge: many documents contain unseen entity types that occur only a couple of times. Addressing this challenge requires models to have the ability of learning entities in a few-shot manner. However, prior works for Few-shot VDER mainly address the problem at the document level with a predefined global entity space, which doesn’t account for the entity-level few-shot scenario: target entity types are locally personalized by each task and entity occurrences vary significantly among documents. To address this unexplored scenario, this paper studies a novel entity-level few-shot VDER task. The challenges lie in the uniqueness of the label space for each task and the increased complexity of out-of-distribution (OOD) contents. To tackle this novel task, we present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization that distinguishes between in-task and out-of-task distribution. Specifically, we adopt a hierarchical decoder (HC) and employ contrastive learning (ContrastProtoNet) to achieve this goal. Furthermore, we introduce a new dataset, FewVEX, to boost future research in the field of entity-level few-shot VDER. Experimental results demonstrate our approaches significantly improve the robustness of popular meta-learning baselines.
2022
pdf
bib
abs
Incorporating Causal Analysis into Diversified and Logical Response Generation
Jiayi Liu
|
Wei Wei
|
Zhixuan Chu
|
Xing Gao
|
Ji Zhang
|
Tan Yan
|
Yulin Kang
Proceedings of the 29th International Conference on Computational Linguistics
Although the Conditional Variational Auto-Encoder (CVAE) model can generate more diversified responses than the traditional Seq2Seq model, the responses often have low relevance with the input words or are illogical with the question. A causal analysis is carried out to study the reasons behind, and a methodology of searching for the mediators and mitigating the confounding bias in dialogues is provided. Specifically, we propose to predict the mediators to preserve relevant information and auto-regressively incorporate the mediators into generating process. Besides, a dynamic topic graph guided conditional variational auto-encoder (TGG-CVAE) model is utilized to complement the semantic space and reduce the confounding bias in responses. Extensive experiments demonstrate that the proposed model is able to generate both relevant and informative responses, and outperforms the state-of-the-art in terms of automatic metrics and human evaluations.
pdf
bib
abs
Capturing Global Structural Information in Long Document Question Answering with Compressive Graph Selector Network
Yuxiang Nie
|
Heyan Huang
|
Wei Wei
|
Xian-Ling Mao
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Long document question answering is a challenging task due to its demands for complex reasoning over long text. Previous works usually take long documents as non-structured flat texts or only consider the local structure in long documents. However, these methods usually ignore the global structure of the long document, which is essential for long-range understanding. To tackle this problem, we propose Compressive Graph Selector Network (CGSN) to capture the global structure in a compressive and iterative manner. The proposed model mainly focuses on the evidence selection phase of long document question answering. Specifically, it consists of three modules: local graph network, global graph network and evidence memory network. Firstly, the local graph network builds the graph structure of the chunked segment in token, sentence, paragraph and segment levels to capture the short-term dependency of the text. Secondly, the global graph network selectively receives the information of each level from the local graph, compresses them into the global graph nodes and applies graph attention to the global graph nodes to build the long-range reasoning over the entire text in an iterative way. Thirdly, the evidence memory network is designed to alleviate the redundancy problem in the evidence selection by saving the selected result in the previous steps. Extensive experiments show that the proposed model outperforms previous methods on two datasets.
pdf
bib
abs
Sequential Topic Selection Model with Latent Variable for Topic-Grounded Dialogue
Xiao-Fei Wen
|
Wei Wei
|
Xian-Ling Mao
Findings of the Association for Computational Linguistics: EMNLP 2022
Recently, topic-grounded dialogue system has attracted significant attention due to its effectiveness in predicting the next topic to yield better responses via the historical context and given topic sequence. However, almost all existing topic prediction solutions focus on only the current conversation and corresponding topic sequence to predict the next conversation topic, without exploiting other topic-guided conversations which may contain relevant topic-transitions to current conversation. To address the problem, in this paper we propose a novel approach, named Sequential Global Topic Attention (SGTA) to exploit topic transition over all conversations in a subtle way for better modeling post-to-response topic-transition and guiding the response generation to the current conversation. Specifically, we introduce a latent space modeled as a Multivariate Skew-Normal distribution with hybrid kernel functions to flexibly integrate the global-level information with sequence-level information, and predict the topic based on the distribution sampling results. We also leverage a topic-aware prior-posterior approach for secondary selection of predicted topics, which is utilized to optimize the response generation task. Extensive experiments demonstrate that our model outperforms competitive baselines on prediction and generation tasks.
pdf
bib
abs
HCL-TAT: A Hybrid Contrastive Learning Method for Few-shot Event Detection with Task-Adaptive Threshold
Ruihan Zhang
|
Wei Wei
|
Xian-Ling Mao
|
Rui Fang
|
Dangyang Chen
Findings of the Association for Computational Linguistics: EMNLP 2022
Event detection has been suffering from constantly emerging event types with lack of sufficient data. Existing works formulate the new problem as few-shot event detection (FSED), and employ two-stage or unified models based on meta-learning to address the problem. However, these methods fall far short of expectations due to: (i) insufficient learning of discriminative representations in low-resource scenarios, and (ii) representation overlap between triggers and non-triggers. To resolve the above issues, in this paper, we propose a novel Hybrid Contrastive Learning method with a Task-Adaptive Threshold (abbreviated as HCL-TAT), which enables discriminative representation learning with a two-view contrastive loss (support-support and prototype-query), and devises an easily-adapted threshold to alleviate misidentification of triggers. Extensive experiments on the benchmark dataset FewEvent demonstrate the superiority of our method to achieve better results compared to the state-of-the-arts. All the data and codes will be available to facilitate future research.
pdf
bib
abs
A GlobalPointer based Robust Approach for Information Extraction from Dialog Transcripts
Yanbo J. Wang
|
Sheng Chen
|
Hengxing Cai
|
Wei Wei
|
Kuo Yan
|
Zhe Sun
|
Hui Qin
|
Yuming Li
|
Xiaochen Cai
Proceedings of the Towards Semi-Supervised and Reinforced Task-Oriented Dialog Systems (SereTOD)
With the widespread popularisation of intelligent technology, task-based dialogue systems (TOD) are increasingly being applied to a wide variety of practical scenarios. As the key tasks in dialogue systems, named entity recognition and slot filling play a crucial role in the completeness and accuracy of information extraction. This paper is an evaluation paper for Sere-TOD 2022 Workshop challenge (Track 1 Information extraction from dialog transcripts). We proposed a multi-model fusion approach based on GlobalPointer, combined with some optimisation tricks, finally achieved an entity F1 of 60.73, an entity-slot-value triple F1 of 56, and an average F1 of 58.37, and got the highest score in SereTOD 2022 Workshop challenge
2021
pdf
bib
The First Workshop on Evaluations and Assessments of Neural Conversation Systems
Wei Wei
|
Bo Dai
|
Tuo Zhao
|
Lihong Li
|
Diyi Yang
|
Yun-Nung Chen
|
Y-Lan Boureau
|
Asli Celikyilmaz
|
Alborz Geramifard
|
Aman Ahuja
|
Haoming Jiang
The First Workshop on Evaluations and Assessments of Neural Conversation Systems
pdf
bib
abs
Multi-granularity Textual Adversarial Attack with Behavior Cloning
Yangyi Chen
|
Jin Su
|
Wei Wei
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Recently, the textual adversarial attack models become increasingly popular due to their successful in estimating the robustness of NLP models. However, existing works have obvious deficiencies. (1)They usually consider only a single granularity of modification strategies (e.g. word-level or sentence-level), which is insufficient to explore the holistic textual space for generation; (2) They need to query victim models hundreds of times to make a successful attack, which is highly inefficient in practice. To address such problems, in this paper we propose MAYA, a Multi-grAnularitY Attack model to effectively generate high-quality adversarial samples with fewer queries to victim models. Furthermore, we propose a reinforcement-learning based method to train a multi-granularity attack agent through behavior cloning with the expert knowledge from our MAYA algorithm to further reduce the query times. Additionally, we also adapt the agent to attack black-box models that only output labels without confidence scores. We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets. Experimental results show that our models achieve overall better attacking performance and produce more fluent and grammatical adversarial samples compared to baseline models. Besides, our adversarial attack agent significantly reduces the query times in both attack settings. Our codes are released at
https://github.com/Yangyi-Chen/MAYA.
pdf
bib
abs
Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
Haoming Jiang
|
Bo Dai
|
Mengjiao Yang
|
Tuo Zhao
|
Wei Wei
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Reliable automatic evaluation of dialogue systems under an interactive environment has long been overdue. An ideal environment for evaluating dialog systems, also known as the Turing test, needs to involve human interaction, which is usually not affordable for large-scale experiments. Though researchers have attempted to use metrics for language generation tasks (e.g., perplexity, BLEU) or some model-based reinforcement learning methods (e.g., self-play evaluation) for automatic evaluation, these methods only show very weak correlation with the actual human evaluation in practice. To bridge such a gap, we propose a new framework named ENIGMA for estimating human evaluation scores based on recent advances of off-policy evaluation in reinforcement learning. ENIGMA only requires a handful of pre-collected experience data, and therefore does not involve human interaction with the target policy during the evaluation, making automatic evaluations feasible. More importantly, ENIGMA is model-free and agnostic to the behavior policies for collecting the experience data, which significantly alleviates the technical difficulties of modeling complex dialogue environments and human behaviors. Our experiments show that ENIGMA significantly outperforms existing methods in terms of correlation with human evaluation scores.
pdf
bib
abs
Context-aware Entity Typing in Knowledge Graphs
Weiran Pan
|
Wei Wei
|
Xian-Ling Mao
Findings of the Association for Computational Linguistics: EMNLP 2021
Knowledge graph entity typing aims to infer entities’ missing types in knowledge graphs which is an important but under-explored issue. This paper proposes a novel method for this task by utilizing entities’ contextual information. Specifically, we design two inference mechanisms: i) N2T: independently use each neighbor of an entity to infer its type; ii) Agg2T: aggregate the neighbors of an entity to infer its type. Those mechanisms will produce multiple inference results, and an exponentially weighted pooling method is used to generate the final inference result. Furthermore, we propose a novel loss function to alleviate the false-negative problem during training. Experiments on two real-world KGs demonstrate the effectiveness of our method. The source code and data of this paper can be obtained from
https://github.com/CCIIPLab/CET.
2020
pdf
bib
abs
A Data-Centric Framework for Composable NLP Workflows
Zhengzhong Liu
|
Guanxiong Ding
|
Avinash Bukkittu
|
Mansi Gupta
|
Pengzhi Gao
|
Atif Ahmed
|
Shikun Zhang
|
Xin Gao
|
Swapnil Singhavi
|
Linwei Li
|
Wei Wei
|
Zecong Hu
|
Haoran Shi
|
Xiaodan Liang
|
Teruko Mitamura
|
Eric Xing
|
Zhiting Hu
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Empirical natural language processing (NLP) systems in application domains (e.g., healthcare, finance, education) involve interoperation among multiple components, ranging from data ingestion, human annotation, to text retrieval, analysis, generation, and visualization. We establish a unified open-source framework to support fast development of such sophisticated NLP workflows in a composable manner. The framework introduces a uniform data representation to encode heterogeneous results by a wide range of NLP tasks. It offers a large repository of processors for NLP tasks, visualization, and annotation, which can be easily assembled with full interoperability under the unified representation. The highly extensible framework allows plugging in custom processors from external off-the-shelf NLP and deep learning libraries. The whole framework is delivered through two modularized yet integratable open-source projects, namely Forte (for workflow infrastructure and NLP function processors) and Stave (for user interaction, visualization, and annotation).
pdf
bib
abs
AirConcierge: Generating Task-Oriented Dialogue via Efficient Large-Scale Knowledge Retrieval
Chieh-Yang Chen
|
Pei-Hsin Wang
|
Shih-Chieh Chang
|
Da-Cheng Juan
|
Wei Wei
|
Jia-Yu Pan
Findings of the Association for Computational Linguistics: EMNLP 2020
Despite recent success in neural task-oriented dialogue systems, developing such a real-world system involves accessing large-scale knowledge bases (KBs), which cannot be simply encoded by neural approaches, such as memory network mechanisms. To alleviate the above problem, we propose , an end-to-end trainable text-to-SQL guided framework to learn a neural agent that interacts with KBs using the generated SQL queries. Specifically, the neural agent first learns to ask and confirm the customer’s intent during the multi-turn interactions, then dynamically determining when to ground the user constraints into executable SQL queries so as to fetch relevant information from KBs. With the help of our method, the agent can use less but more accurate fetched results to generate useful responses efficiently, instead of incorporating the entire KBs. We evaluate the proposed method on the AirDialogue dataset, a large corpus released by Google, containing the conversations of customers booking flight tickets from the agent. The experimental results show that significantly improves over previous work in terms of accuracy and the BLEU score, which demonstrates not only the ability to achieve the given task but also the good quality of the generated dialogues.
pdf
bib
abs
Question Answering with Long Multiple-Span Answers
Ming Zhu
|
Aman Ahuja
|
Da-Cheng Juan
|
Wei Wei
|
Chandan K. Reddy
Findings of the Association for Computational Linguistics: EMNLP 2020
Answering questions in many real-world applications often requires complex and precise information excerpted from texts spanned across a long document. However, currently no such annotated dataset is publicly available, which hinders the development of neural question-answering (QA) systems. To this end, we present MASH-QA, a Multiple Answer Spans Healthcare Question Answering dataset from the consumer health domain, where answers may need to be excerpted from multiple, non-consecutive parts of text spanned across a long document. We also propose MultiCo, a neural architecture that is able to capture the relevance among multiple answer spans, by using a query-based contextualized sentence selection approach, for forming the answer to the given question. We also demonstrate that conventional QA models are not suitable for this type of task and perform poorly in this setting. Extensive experiments are conducted, and the experimental results confirm the proposed model significantly outperforms the state-of-the-art QA models in this multi-span QA setting.
2019
pdf
bib
abs
Evaluating and Enhancing the Robustness of Dialogue Systems: A Case Study on a Negotiation Agent
Minhao Cheng
|
Wei Wei
|
Cho-Jui Hsieh
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Recent research has demonstrated that goal-oriented dialogue agents trained on large datasets can achieve striking performance when interacting with human users. In real world applications, however, it is important to ensure that the agent performs smoothly interacting with not only regular users but also those malicious ones who would attack the system through interactions in order to achieve goals for their own advantage. In this paper, we develop algorithms to evaluate the robustness of a dialogue agent by carefully designed attacks using adversarial agents. Those attacks are performed in both black-box and white-box settings. Furthermore, we demonstrate that adversarial training using our attacks can significantly improve the robustness of a goal-oriented dialogue system. On a case-study of the negotiation agent developed by (Lewis et al., 2017), our attacks reduced the average advantage of rewards between the attacker and the trained RL-based agent from 2.68 to -5.76 on a scale from -10 to 10 for randomized goals. Moreover, we show that with the adversarial training, we are able to improve the robustness of negotiation agents by 1.5 points on average against all our attacks.
pdf
bib
abs
On the Robustness of Self-Attentive Models
Yu-Lun Hsieh
|
Minhao Cheng
|
Da-Cheng Juan
|
Wei Wei
|
Wen-Lian Hsu
|
Cho-Jui Hsieh
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
This work examines the robustness of self-attentive neural networks against adversarial input perturbations. Specifically, we investigate the attention and feature extraction mechanisms of state-of-the-art recurrent neural networks and self-attentive architectures for sentiment analysis, entailment and machine translation under adversarial attacks. We also propose a novel attack algorithm for generating more natural adversarial examples that could mislead neural models but not humans. Experimental results show that, compared to recurrent neural models, self-attentive models are more robust against adversarial perturbation. In addition, we provide theoretical explanations for their superior robustness to support our claims.
2018
pdf
bib
abs
AirDialogue: An Environment for Goal-Oriented Dialogue Research
Wei Wei
|
Quoc Le
|
Andrew Dai
|
Jia Li
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Recent progress in dialogue generation has inspired a number of studies on dialogue systems that are capable of accomplishing tasks through natural language interactions. A promising direction among these studies is the use of reinforcement learning techniques, such as self-play, for training dialogue agents. However, current datasets are limited in size, and the environment for training agents and evaluating progress is relatively unsophisticated. We present AirDialogue, a large dataset that contains 301,427 goal-oriented conversations. To collect this dataset, we create a context-generator which provides travel and flight restrictions. We then ask human annotators to play the role of a customer or an agent and interact with the goal of successfully booking a trip given the restrictions. Key to our environment is the ease of evaluating the success of the dialogue, which is achieved by using ground-truth states (e.g., the flight being booked) generated by the restrictions. Any dialogue agent that does not generate the correct states is considered to fail. Our experimental results indicate that state-of-the-art dialogue models can only achieve a score of 0.17 while humans can reach a score of 0.91, which suggests significant opportunities for future improvement.
2017
pdf
bib
abs
Structural Embedding of Syntactic Trees for Machine Comprehension
Rui Liu
|
Junjie Hu
|
Wei Wei
|
Zi Yang
|
Eric Nyberg
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Deep neural networks for machine comprehension typically utilizes only word or character embeddings without explicitly taking advantage of structured linguistic information such as constituency trees and dependency trees. In this paper, we propose structural embedding of syntactic trees (SEST), an algorithm framework to utilize structured information and encode them into vector representations that can boost the performance of algorithms for the machine comprehension. We evaluate our approach using a state-of-the-art neural attention model on the SQuAD dataset. Experimental results demonstrate that our model can accurately identify the syntactic boundaries of the sentences and extract answers that are syntactically coherent over the baseline methods.
2013
pdf
bib
abs
The CASIA machine translation system for IWSLT 2013
Xingyuan Peng
|
Xiaoyin Fu
|
Wei Wei
|
Zhenbiao Chen
|
Wei Chen
|
Bo Xu
Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign
In this paper, we describe the CASIA statistical machine translation (SMT) system for the IWSLT2013 Evaluation Campaign. We participated in the Chinese-English and English-Chinese translation tasks. For both of these tasks, we used a hierarchical phrase-based (HPB) decoder and made it as our baseline translation system. A number of techniques were proposed to deal with these translation tasks, including parallel sentence extraction, pre-processing, translation model (TM) optimization, language model (LM) interpolation, turning, and post-processing. With these techniques, the translation results were significantly improved compared with that of the baseline system.
pdf
bib
Phrase-based Parallel Fragments Extraction from Comparable Corpora
Xiaoyin Fu
|
Wei Wei
|
Shixiang Lu
|
Zhenbiao Chen
|
Bo Xu
Proceedings of the Sixth International Joint Conference on Natural Language Processing
2012
pdf
bib
Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models
Shixiang Lu
|
Wei Wei
|
Xiaoyin Fu
|
Bo Xu
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
2011
pdf
bib
Effective Use of Discontinuous Phrases for Hierarchical Phrase-based Translation
Wei Wei
|
Bo Xu
Proceedings of Machine Translation Summit XIII: Papers
pdf
bib
Enhancing the HL-SOT Approach to Sentiment Analysis via a Localized Feature Selection Framework
Wei Wei
|
Jon Atle Gulla
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
pdf
bib
Sentiment Learning on Product Reviews via Sentiment Ontology Tree
Wei Wei
|
Jon Atle Gulla
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
2006
pdf
bib
NLPR translation system for IWSLT 2006 evaluation campaign
Chunguang Chai
|
Jinhua Du
|
Wei Wei
|
Peng Liu
|
Keyan Zhou
|
Yanqing He
|
Chengqing Zong
Proceedings of the Third International Workshop on Spoken Language Translation: Evaluation Campaign
2005
pdf
bib
The CASIA Phrase-Based Machine Translation System
Wei Pang
|
Zhendong Yang
|
Zhenbiao Chen
|
Wei Wei
|
Bo Xu
|
Chengqing Zong
Proceedings of the Second International Workshop on Spoken Language Translation