Shizhu He

2024

Large language models (LLMs) have demonstrated remarkable capabilities in comprehensively handling various types of natural language processing (NLP) tasks. However, there are significant differences in the knowledge and abilities required for different tasks. Therefore, it is important to understand whether the same LLM processes different tasks in the same way. Are there specific neurons in a LLM for different tasks? Inspired by neuroscience, this paper pioneers the exploration of whether distinct neurons are activated when a LLM handles different tasks. Compared with current research exploring the neurons of language and knowledge, task-specific neurons present a greater challenge due to their abstractness, diversity, and complexity. To address these challenges, this paper proposes a method for task-specific neuron localization based on Causal Gradient Variation with Special Tokens (CGVST). CGVST identifies task-specific neurons by concentrating on the most significant tokens during task processing, thereby eliminating redundant tokens and minimizing interference from non-essential neurons. Compared to traditional neuron localization methods, our approach can more effectively identify task-specific neurons. We conduct experiments across eight different public tasks. Experiments involving the inhibition and amplification of identified neurons demonstrate that our method can accurately locate task-specific neurons.

We introduce DA-Code, a code generation benchmark specifically designed to assess LLMs on agent-based data science tasks. This benchmark features three core elements: First, the tasks within DA-Code are inherently challenging, setting them apart from traditional code generation tasks and demanding advanced coding skills in grounding and planning. Second, examples in DA-Code are all based on real and diverse data, covering a wide range of complex data wrangling and analytics tasks. Third, to solve the tasks, the models must utilize complex data science programming languages, including Python and SQL, to perform intricate data processing and derive the answers. We set up the benchmark in a controllable and executable environment that aligns with real-world data analysis scenarios and is scalable. The annotators meticulously designed the evaluation suite to ensure the accuracy and robustness of the evaluation. We developed the DA-Agent baseline. Experiments show that although the baseline performs better than other existing frameworks, using the current best LLMs achieves only 30.5% accuracy, leaving ample room for improvement. We release our benchmark at [link](https://github.com/yiyihum/dabench)

To address the issues of insufficient knowledge and hallucination in Large Language Models (LLMs), numerous studies have explored integrating LLMs with Knowledge Graphs (KGs). However, these methods are typically evaluated on conventional Knowledge Graph Question Answering (KGQA) with complete KGs, where all factual triples required for each question are entirely covered by the given KG. In such cases, LLMs primarily act as an agent to find answer entities within the KG, rather than effectively integrating the internal knowledge of LLMs and external knowledge sources such as KGs. In fact, KGs are often incomplete to cover all the knowledge required to answer questions. To simulate these real-world scenarios and evaluate the ability of LLMs to integrate internal and external knowledge, we propose leveraging LLMs for QA under Incomplete Knowledge Graph (IKGQA), where the provided KG lacks some of the factual triples for each question, and construct corresponding datasets. To handle IKGQA, we propose a training-free method called Generate-on-Graph (GoG), which can generate new factual triples while exploring KGs. Specifically, GoG performs reasoning through a Thinking-Searching-Generating framework, which treats LLM as both Agent and KG in IKGQA. Experimental results on two datasets demonstrate that our GoG outperforms all previous methods.

Large Language Models (LLMs) can teach small language models (SLMs) to solve complex reasoning tasks (e.g., mathematical question answering) by Chain-of-thought Distillation (CoTD). Specifically, CoTD fine-tunes SLMs by utilizing rationales generated from LLMs such as ChatGPT. However, CoTD has certain limitations that make it unsuitable for knowledge-intensive multi-hop question answering: 1) SLMs have a very limited capacity in memorizing required knowledge compared to LLMs. 2) SLMs do not possess the same powerful integrated abilities in question understanding and knowledge reasoning as LLMs. To address the above limitations, we introduce Decompose-and-Response Distillation (D&R Distillation), which distills two student models, namely Decomposer and Responser separately. The two models solve a knowledge-intensive multi-hop question through an interactive process of asking and answering subquestions. Our method offers two advantages: 1) SLMs have the capability to access external knowledge to address subquestions, which provides more comprehensive knowledge for multi-hop questions. 2) By employing simpler subquestions instead of complex CoT reasoning, SLMs effectively mitigate task complexity and decrease data prerequisites. Experimental results on three knowledge-intensive multi-hop question answering datasets demonstrate that D&R Distillation can surpass previous CoTD methods, even with much less training data.

pdf abs
Instance-Level Dynamic LoRAs Composition for Cross-Task Generalization
Zhiqi Wang | Shizhu He | Kang Liu | Jun Zhao
Findings of the Association for Computational Linguistics: EMNLP 2024

Large language models perform well on tasks that have undergone fine-tuning of instructions, but their performance on completely unseen tasks is often less than ideal. To overcome the challenge of cross-task generalization, task-level LoRAs combination is proposed, which does not require training a model for new tasks. Instead, it learns the LoRA modules combination weights based on a small number of samples to form the task model. However, task-level LoRAs combination only utilizes a few task modules due to its reliance on the weight enumeration method, and it also ignores the specificity between different instances. Therefore, we proposed an instance-level LoRAs composition for cross-task generalization, which selects appropriate multiple task LoRA modules for each input instance and dynamically determines the composition weights. Our experiments on publicly available datasets show that our method outperforms the typical method, LoraHub, in 16 out of 27 tasks. We release the source code at https://github.com/noname822/iLoraComp.git

pdf abs
S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Model
Fangyu Lei | Qian Liu | Yiming Huang | Shizhu He | Jun Zhao | Kang Liu
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

The rapid development of Large Language Models (LLMs) has led to great strides in model capabilities like long-context understanding and reasoning.However, as LLMs are able to process longer contexts, it becomes more challenging to evaluate whether they have acquired certain capabilities, since the length of text (e.g., 200K tokens) they can process far exceeds what humans can reliably assess in a reasonable duration.In this paper, we propose using complex synthetic tasks as a proxy evaluation method, and present S3Eval, a Synthetic, Scalable, Systematic evaluation suite for LLMs evaluation.The synthetic nature of S3Eval provides users full control over the dataset, allowing them to systematically probe LLM capabilities by scaling text length and varying task difficulty across diverse scenarios.The strong correlation between S3Eval and real-world benchmarks demonstrates the soundness of using S3Eval for evaluation of LLMs.S3Eval provides a flexible and infinite long-context data generation method. We have generated a comprehensive dataset called S3Eval-Standard, and experimental results have shown that it poses significant challenges for all existing LLMs.

Although Large Language Models (LLMs) are showing impressive performance on a wide range of Natural Language Processing tasks, researchers have found that they still have limited ability to conduct induction. Recent works mainly adopt “post processes” paradigms to improve the performance of LLMs on induction (e.g., the hypothesis search & refinement methods), but their performance is still constrained by the inherent inductive capability of the LLMs. In this paper, we propose a novel framework, Induction through Deduction (ItD), to enable the LLMs to teach themselves induction through deduction. The ItD framework is composed of two main components: a Deductive Data Generation module to generate induction data and a Naive Bayesian Induction module to optimize the fine-tuning and decoding of LLMs. Our empirical results showcase the effectiveness of ItD on two induction benchmarks, achieving relative performance improvement of 36% and 10% compared with previous state-of-the-art, respectively. Our ablation study verifies the effectiveness of two key modules of ItD. We also verify the effectiveness of ItD across different LLMs and deductors. The data and code of this paper can be found at https://github.com/forangel2014/ItD.

pdf abs
BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation
Yuhong He | Yongqi Zhang | Shizhu He | Jun Wan
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Medical dialogue generation (MDG) has gained increasing attention due to its substantial practical value. Previous works typically employ a sequence-to-sequence framework to generate medical responses by modeling dialogue context as sequential text with annotated medical entities. While these methods have been successful in generating fluent responses, they fail to provide process explanations of reasoning and require extensive entity annotation. To address these limitations, we propose the method Bootstrap Prompting for Explicit Reasoning in MDG (BP4ER), which explicitly model MDG’s multi-step reasoning process and iteratively enhance this reasoning process. We employ a least-to-most prompting strategy to guide a large language model (LLM) in explicit reasoning, breaking down MDG into simpler sub-questions. These sub-questions build on answers from previous ones. Additionally, we also introduce two distinct bootstrapping techniques for prompting, which autonomously correct errors and facilitate the LLM’s explicit reasoning. This approach eliminates the need for entity annotation and increases the transparency of the MDG process by explicitly generating the intermediate reasoning chain. Experimental results on the two publicly datasets show that BP4ER outperforms state-of-the-art methods across both objective and subjective evaluation.

Chain-of-thought Distillation (CoTD) aims at distilling Chain-of-thought (CoT) reasoning ability of large language models (LLMs) to much smaller student models. The core of CoTD is using a large teacher model to generate rationales and fine-tune smaller student models. However, current Chain-of-thought Distillation works have the following limitations: 1) Student models are separately distilled from specific reasoning tasks and lack a collaboration mechanism, hindering the enhancement of reasoning performance through collaboration among various reasoning tasks. 2) The parameter update of student models severely harms the CoT reasoning ability on other unseen reasoning tasks not included in the distillation process. In this work, we introduce a novel CoT Distillation method, MoDE-CoTD, which decouples the CoT reasoning abilities out of the student model by distilling multiple LoRA-Experts and freezing the parameters of the student model. Sequentially, LoRA-Experts are combined and adapted to handle both seen and unseen reasoning tasks, enabling collaboration among diverse reasoning tasks to further enhance CoT reasoning performance. Experimental results on 14 datasets (including 4 unseen datasets) demonstrate the strength of MoDE-CoTD, with an average accuracy gain of 6.3% on seen datasets and 7.8% on unseen datasets.

pdf abs
Towards Graph-hop Retrieval and Reasoning in Complex Question Answering over Textual Database
Minjun Zhu | Yixuan Weng | Shizhu He | Kang Liu | Haifeng Liu | Yang jun Jun | Jun Zhao
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In textual question answering (TQA) systems, complex questions often require retrieving multiple textual fact chains with multiple reasoning steps. While existing benchmarks are limited to single-chain or single-hop retrieval scenarios. In this paper, we propose to conduct Graph-Hop —— a novel multi-chains and multi-hops retrieval and reasoning paradigm in complex question answering. We construct a new benchmark called ReasonGraphQA, which provides explicit and fine-grained evidence graphs for complex question to support comprehensive and detailed reasoning. In order to further study how graph-based evidential reasoning can be performed, we explore what form of Graph-Hop works best for generating textual evidence explanations in knowledge reasoning and question answering. We have thoroughly evaluated existing evidence retrieval and reasoning models on the ReasonGraphQA. Experiments highlight Graph-Hop is a promising direction for answering complex questions, but it still has certain limitations. We have further studied mitigation strategies to meet these challenges and discuss future directions.

2023

Answering multi-hop questions over hybrid factual knowledge from the given text and table (TextTableQA) is a challenging task. Existing models mainly adopt a retriever-reader framework, which have several deficiencies, such as noisy labeling in training retriever, insufficient utilization of heterogeneous information over text and table, and deficient ability for different reasoning operations. In this paper, we propose a three-stage TextTableQA framework S3HQA, which comprises of retriever, selector, and reasoner. We use a retriever with refinement training to solve the noisy labeling problem. Then, a hybrid selector considers the linked relationships between heterogeneous data to select the most relevant factual knowledge. For the final stage, instead of adapting a reading comprehension module like in previous methods, we employ a generation-based reasoner to obtain answers. This includes two approaches: a row-wise generator and an LLM prompting generator (first time used in this task). The experimental results demonstrate that our method achieves competitive results in the few-shot setting. When trained on the full dataset, our approach outperforms all baseline methods, ranking first on the HybridQA leaderboard.

pdf abs
On the Effects of Structural Modeling for Neural Semantic Parsing
Xiang Zhang | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 27th Conference on Computational Natural Language Learning (CoNLL)

Semantic parsing aims to map natural language sentences to predefined formal languages, such as logic forms and programming languages, as the semantic annotation. From the theoretic views of linguistic and programming language, structures play an important role in both languages, which had motivated semantic parsers since the task was proposed in the beginning. But in the neural era, semantic parsers treating both natural and formal language as sequences, such as Seq2Seq and LLMs, have got more attentions. On the other side, lots of neural progress have been made for grammar induction, which only focuses on natural languages. Although closely related in the sense of structural modeling, these techniques hadn’t been jointly analyzed on the semantic parsing testbeds. To gain the better understanding on structures for semantic parsing, we design a taxonomy of structural modeling methods, and evaluate some representative techniques on semantic parsing, including both compositional and i.i.d. generalizations. In addition to the previous opinion that structures will help in general, we find that (1) structures must be designed for the specific dataset and generalization level, and (2) what really matters is not the structure choice of either source or target side, but the choice combination of both sides. Based on the finding, we further propose a metric that can evaluate the structure choice, which we believe can boost the automation of grammar designs for specific datasets and domains.

Intent detection, which estimates diverse intents behind user utterances, is an essential component of task-oriented dialogue systems. Previous intent detection models are usually trained offline, which can only handle predefined intent classes. In the real world, new intents may keep challenging deployed models. For example, with the prevalence of the COVID-19 pandemic, users may pose various issues related to the pandemic to conversational systems, which brings many new intents. A general intent detection model should be intelligent enough to continually learn new data and recognize new arriving intent classes. Therefore, this work explores Class Lifelong Learning for Intent Detection (CLL-ID), where the model continually learns new intent classes from new data while avoiding catastrophic performance degradation on old data. To this end, we propose a novel lifelong learning method, called Structure Consolidation Networks (SCN), which consists of structure-based retrospection and contrastive knowledge distillation to handle the problems of expression diversity and class imbalance in the CLL-ID task. In addition to formulating the new task, we construct 3 benchmarks based on 8 intent detection datasets. Experimental results demonstrate the effectiveness of SCN, which significantly outperforms previous lifelong learning methods on the three benchmarks.

pdf abs
Prediction and Calibration: Complex Reasoning over Knowledge Graph with Bi-directional Directed Acyclic Graph Neural Network
Yao Xu | Shizhu He | Li Cai | Kang Liu | Jun Zhao
Findings of the Association for Computational Linguistics: ACL 2023

Answering complex logical queries is a challenging task for knowledge graph (KG) reasoning. Recently, query embedding (QE) has been proposed to encode queries and entities into the same vector space, and obtain answers based on numerical computation. However, such models obtain the node representations of a query only based on its predecessor nodes, which ignore the information contained in successor nodes. In this paper, we proposed a Bi-directional Directed Acyclic Graph neural network (BiDAG) that splits the reasoning process into prediction and calibration. The joint probability of all nodes is considered by applying a graph neural network (GNN) to the query graph in the calibration process. By the prediction in the first layer and the calibration in deep layers of GNN, BiDAG can outperform previous QE based methods on FB15k, FB15k-237, and NELL995.

Multilingual Knowledge Graph Completion (mKGC) aim at solving queries in different languages by reasoning a tail entity thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages, its pretraining tasks cannot be directly aligned with the mKGC tasks. Moreover, the majority of KGs and PLMs currently available exhibit a pronounced English-centric bias. This makes it difficult for mKGC to achieve good results, particularly in the context of low-resource languages. To overcome previous problems, this paper introduces global and local knowledge constraints for mKGC. The former is used to constrain the reasoning of answer entities , while the latter is used to enhance the representation of query contexts. The proposed method makes the pretrained model better adapt to the mKGC task. Experimental results on public datasets demonstrate that our method outperforms the previous SOTA on Hits@1 and Hits@10 by an average of 12.32% and 16.03%, which indicates that our proposed method has significant enhancement on mKGC.

Recently, with the chain of thought (CoT) prompting, large language models (LLMs), e.g., GPT-3, have shown strong reasoning ability in several natural language processing tasks such as arithmetic, commonsense, and logical reasoning. However, LLMs with CoT require multi-step prompting and multi-token prediction, which is highly sensitive to individual mistakes and vulnerable to error accumulation. The above issues make the LLMs need the ability to verify the answers. In fact, after inferring conclusions in some thinking decision tasks, people often check them by re-verifying steps to avoid some mistakes. In this paper, we propose and prove that LLMs also have similar self-verification abilities. We take the conclusion obtained by CoT as one of the conditions for solving the original problem. By performing a backward verification of the answers that LLM deduced for itself, we can obtain interpretable answer validation scores to select the candidate answer with the highest score. Experimental results demonstrate that the proposed method can improve the reasoning performance on various arithmetic, commonsense, and logical reasoning datasets. Our code is publicly available at: https://github.com/WENGSYX/Self-Verification.

Pre-trained sentence representations are crucial for identifying significant sentences in unsupervised document extractive summarization. However, the traditional two-step paradigm of pre-training and sentence-ranking, creates a gap due to differing optimization objectives. To address this issue, we argue that utilizing pre-trained embeddings derived from a process specifically designed to optimize informative and distinctive sentence representations helps rank significant sentences. To do so, we propose a novel graph pre-training auto-encoder to obtain sentence embeddings by explicitly modelling intra-sentential distinctive features and inter-sentential cohesive features through sentence-word bipartite graphs. These fine-tuned sentence embeddings are then utilized in a graph-based ranking algorithm for unsupervised summarization. Our method is a plug-and-play pre-trained model that produces predominant performance for unsupervised summarization frameworks by providing summary-worthy sentence representations. It surpasses heavy BERT- or RoBERTa-based sentence representations in downstream tasks.

Complex Query Answering (CQA) is a challenge task of Knowledge Graph (KG). Due to the incompleteness of KGs, query embedding (QE) methods have been proposed to encode queries and entities into the same embedding space, and treat logical operators as neural set operators to obtain answers. However, these methods train KG embeddings and neural set operators concurrently on both simple (one-hop) and complex (multi-hop and logical) queries, which causes performance degradation on simple queries and low training efficiency. In this paper, we propose Query to Triple (Q2T), a novel approach that decouples the training for simple and complex queries. Q2T divides the training into two stages: (1) Pre-training the neural link predictor on simple queries to predict tail entities based on the head entity and relation. (2) Training the query encoder on complex queries to encode diverse complex queries into a unified triple form that can be efficiently solved by the pretrained link predictor. Our proposed Q2T is not only efficient to train, but also modular, thus easily adaptable to various neural link predictors that have been studied well. Extensive experiments demonstrate that, even without explicit modeling for neural set operators, Q2T still achieves state-of-the-art performance on diverse complex queries over three public benchmarks.

pdf abs
Efficient Data Learning for Open Information Extraction with Pre-trained Language Models
Zhiyuan Fan | Shizhu He
Findings of the Association for Computational Linguistics: EMNLP 2023

Open Information Extraction (OpenIE) is a fundamental yet challenging task in Natural Language Processing, which involves extracting all triples (subject, predicate, object) from a given sentence. While labelling-based methods have their merits, generation-based techniques offer unique advantages, such as the ability to generate tokens not present in the original sentence. However, these generation-based methods often require a significant amount of training data to learn the task form of OpenIE and substantial training time to overcome slow model convergence due to the order penalty. In this paper, we introduce a novel framework, OK-IE, that ingeniously transforms the task form of OpenIE into the pre-training task form of the T5 model, thereby reducing the need for extensive training data. Furthermore, we introduce an innovative concept of ‘anchors’ to control the sequence of model outputs, effectively eliminating the impact of order penalty on model convergence and significantly reducing training time. Experimental results indicate that, compared to previous SOTA methods, OK-IE requires only 1/100 of the training data (900 instances) and 1/120 of the training time (3 minutes) to achieve comparable results.

pdf abs
ExpNote: Black-box Large Language Models are better Task Solvers with Experience Notebook
Wangtao Sun | Xuanqing Yu | Shizhu He | Jun Zhao | Kang Liu
Findings of the Association for Computational Linguistics: EMNLP 2023

Black-box Large Language Models (LLMs) have shown great power in solving various tasks and are considered general problem solvers. However, LLMs still fail in many specific tasks although understand the task instruction. In this paper, we focus on the problem of boosting the ability of black-box LLMs to solve downstream tasks. We propose ExpNote, an automated framework to help LLMs better adapt to unfamiliar tasks through reflecting and noting experiences from training data and retrieving them from external memory during testing. We evaluate ExpNote on multiple tasks and the experimental results demonstrate that the proposed method significantly improves the performance of black-box LLMs. The data and code are available at https://github.com/forangel2014/ExpNote.

pdf
Multi-Target Semantic Parsing with Collaborative Deliberation Network
Xiang Li | Fangyu Lei | Shizhu He | Kang Liu | Jun Zhao
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

pdf abs
Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model
Fei Xia | Yixuan Weng | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

Taxonomies, which organize domain concepts into hierarchical structures, are crucial for building knowledge systems and downstream applications. As domain knowledge evolves, taxonomies need to be continuously updated to include new concepts. Previous approaches have mainly focused on adding concepts to the leaf nodes of the existing hierarchical tree, which does not fully utilize the taxonomy’s knowledge and is unable to update the original taxonomy structure (usually involving non-leaf nodes). In this paper, we propose a two-stage method called ATTEMPT for taxonomy completion. Our method inserts new concepts into the correct position by finding a parent node and labeling child nodes. Specifically, by combining local nodes with prompts to generate natural sentences, we take advantage of pre-trained language models for hypernym/hyponymy recognition. Experimental results on two public datasets (including six domains) show that ATTEMPT performs best on both taxonomy completion and extension tasks, surpassing existing methods.

2022

Text-to-SQL aims to parse natural language questions into SQL queries, which is valuable in providing an easy interface to access large databases. Previous work has observed that leveraging lexico-logical alignments is very helpful to improve parsing performance. However, current attention-based approaches can only model such alignments at the token level and have unsatisfactory generalization capability. In this paper, we propose a new approach to leveraging explicit lexico-logical alignments. It first identifies possible phrase-level alignments and injects them as additional contexts to guide the parsing procedure. Experimental results on Squall show that our approach can make better use of such alignments and obtains an absolute improvement of 3.4% compared with the current state-of-the-art.

Visual Dialogue (VD) task has recently received increasing attention in AI research. Visual Dialog aims to generate multi-round, interactive responses based on the dialog history and image content. Existing textual dialogue models cannot fully understand visual information, resulting in a lack of scene features when communicating with humans continuously. Therefore, how to efficiently fuse multimodal data features remains to be a challenge. In this work, we propose a knowledge transfer method with visual prompt (VPTG) fusing multi-modal data, which is a flexible module that can utilize the text-only seq2seq model to handle visual dialogue tasks. The VPTG conducts text-image co-learning and multi-modal information fusion with visual prompts and visual knowledge distillation. Specifically, we construct visual prompts from visual representations and then induce sequence-to-sequence(seq2seq) models to fuse visual information and textual contexts by visual-text patterns. And we also realize visual knowledge transfer through distillation between two different models’ text representations, so that the seq2seq model can actively learn visual semantic representations. Extensive experiments on the multi-modal dialogue understanding and generation (MDUG) datasets show the proposed VPTG outperforms other single-modal methods, which demonstrate the effectiveness of visual prompt and visual knowledge transfer.

The medical conversational system can relieve doctors’ burden and improve healthcare efficiency, especially during the COVID-19 pandemic. However, the existing medical dialogue systems have the problems of weak scalability, insufficient knowledge, and poor controllability. Thus, we propose a medical conversational question-answering (CQA) system based on the knowledge graph, namely MedConQA, which is designed as a pipeline framework to maintain high flexibility. Our system utilizes automated medical procedures, including medical triage, consultation, image-text drug recommendation, and record. Each module has been open-sourced as a tool, which can be used alone or in combination, with robust scalability. Besides, to conduct knowledge-grounded dialogues with users, we first construct a Chinese Medical Knowledge Graph (CMKG) and collect a large-scale Chinese Medical CQA (CMCQA) dataset, and we design a series of methods for reasoning more intellectually. Finally, we use several state-of-the-art (SOTA) techniques to keep the final generated response more controllable, which is further assured by hospital and professional evaluations. We have open-sourced related code, datasets, web pages, and tools, hoping to advance future research.

pdf abs
Incremental Intent Detection for Medical Domain with Contrast Replay Networks
Guirong Bai | Shizhu He | Kang Liu | Jun Zhao
Findings of the Association for Computational Linguistics: ACL 2022

Conventional approaches to medical intent detection require fixed pre-defined intent categories. However, due to the incessant emergence of new medical intents in the real world, such requirement is not practical. Considering that it is computationally expensive to store and re-train the whole data every time new data and intents come in, we propose to incrementally learn emerged intents while avoiding catastrophically forgetting old intents. We first formulate incremental learning for medical intent detection. Then, we employ a memory-based method to handle incremental learning. We further propose to enhance the method with contrast replay networks, which use multilevel distillation and contrast objective to address training data imbalance and medical rare words respectively. Experiments show that the proposed method outperforms the state-of-the-art model by 5.7% and 9.1% of accuracy on two benchmarks respectively.

pdf abs
LingJing at SemEval-2022 Task 1: Multi-task Self-supervised Pre-training for Multilingual Reverse Dictionary
Bin Li | Yixuan Weng | Fei Xia | Shizhu He | Bin Sun | Shutao Li
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)

This paper introduces the approach of Team LingJing’s experiments on SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings (CODWOE). This task aims at comparing two types of semantic descriptions and including two sub-tasks: the definition modeling and reverse dictionary track. Our team focuses on the reverse dictionary track and adopts the multi-task self-supervised pre-training for multilingual reverse dictionaries. Specifically, the randomly initialized mDeBERTa-base model is used to perform multi-task pre-training on the multilingual training datasets. The pre-training step is divided into two stages, namely the MLM pre-training stage and the contrastive pre-training stage. The experimental results show that the proposed method has achieved good performance in the reverse dictionary track, where we rank the 1-st in the Sgns targets of the EN and RU languages. All the experimental codes are open-sourced at https://github.com/WENGSYX/Semeval.

This paper presents the results and main findings of our system on SemEval-2022 Task 3 Presupposed Taxonomies: Evaluating Neural Network Semantics (PreTENS). This task aims at semantic competence with specific attention on the evaluation of language models, which is a task with respect to the recognition of appropriate taxonomic relations between two nominal arguments. Two sub-tasks including binary classification and regression are designed for the evaluation. For the classification sub-task, we adopt the DeBERTa-v3 pre-trained model for fine-tuning datasets of different languages. Due to the small size of the training datasets of the regression sub-task, we transfer the knowledge of classification model (i.e., model parameters) to the regression task. The experimental results show that the proposed method achieves the best results on both sub-tasks. Meanwhile, we also report negative results of multiple training strategies for further discussion. All the experimental codes are open-sourced at https://github.com/WENGSYX/Semeval.

pdf
Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder
Fangyu Lei | Shizhu He | Xiang Li | Jun Zhao | Kang Liu
Proceedings of the 29th International Conference on Computational Linguistics

pdf abs
Decoupling Mixture-of-Graphs: Unseen Relational Learning for Knowledge Graph Completion by Fusing Ontology and Textual Experts
Ran Song | Shizhu He | Suncong Zheng | Shengxiang Gao | Kang Liu | Zhengtao Yu | Jun Zhao
Proceedings of the 29th International Conference on Computational Linguistics

Knowledge Graph Embedding (KGE) has been proposed and successfully utilized to knowledge Graph Completion (KGC). But classic KGE paradigm often fail in unseen relation representations. Previous studies mainly utilize the textual descriptions of relations and its neighbor relations to represent unseen relations. In fact, the semantics of a relation can be expressed by three kinds of graphs: factual graph, ontology graph, textual description graph, and they can complement each other. A more common scenario in the real world is that seen and unseen relations appear at the same time. In this setting, the training set (only seen relations) and testing set (both seen and unseen relations) own different distributions. And the train-test inconsistency problem will make KGE methods easiy overfit on seen relations and under-performance on unseen relations. In this paper, we propose decoupling mixture-of-graph experts (DMoG) for unseen relations learning, which could represent the unseen relations in the factual graph by fusing ontology and textual graphs, and decouple fusing space and reasoning space to alleviate overfitting for seen relations. The experiments on two unseen only public datasets and a mixture dataset verify the effectiveness of the proposed method, which improves the state-of-the-art methods by 6.84% in Hits@10 on average.

2021

Dialogue state tracking (DST), which estimates user goals given a dialogue context, is an essential component of task-oriented dialogue systems. Conventional DST models are usually trained offline, which requires a fixed dataset prepared in advance. This paradigm is often impractical in real-world applications since online dialogue systems usually involve continually emerging new data and domains. Therefore, this paper explores Domain-Lifelong Learning for Dialogue State Tracking (DLL-DST), which aims to continually train a DST model on new data to learn incessantly emerging new domains while avoiding catastrophically forgetting old learned domains. To this end, we propose a novel domain-lifelong learning method, called Knowledge Preservation Networks (KPN), which consists of multi-prototype enhanced retrospection and multi-strategy knowledge distillation, to solve the problems of expression diversity and combinatorial explosion in the DLL-DST task. Experimental results show that KPN effectively alleviates catastrophic forgetting and outperforms previous state-of-the-art lifelong learning methods by 4.25% and 8.27% of whole joint goal accuracy on the MultiWOZ benchmark and the SGD benchmark, respectively.

2020

pdf abs
Pre-trained Language Model Based Active Learning for Sentence Matching
Guirong Bai | Shizhu He | Kang Liu | Jun Zhao | Zaiqing Nie
Proceedings of the 28th International Conference on Computational Linguistics

Active learning is able to significantly reduce the annotation cost for data-driven techniques. However, previous active learning approaches for natural language processing mainly depend on the entropy-based uncertainty criterion, and ignore the characteristics of natural language. In this paper, we propose a pre-trained language model based active learning approach for sentence matching. Differing from previous active learning, it can provide linguistic criteria from the pre-trained language model to measure instances and help select more effective instances for annotation. Experiments demonstrate our approach can achieve greater accuracy with fewer labeled training instances.

2019

pdf abs
Vocabulary Pyramid Network: Multi-Pass Encoding and Decoding with Multi-Level Vocabularies for Response Generation
Cao Liu | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We study the task of response generation. Conventional methods employ a fixed vocabulary and one-pass decoding, which not only make them prone to safe and general responses but also lack further refining to the first generated raw sequence. To tackle the above two problems, we present a Vocabulary Pyramid Network (VPN) which is able to incorporate multi-pass encoding and decoding with multi-level vocabularies into response generation. Specifically, the dialogue input and output are represented by multi-level vocabularies which are obtained from hierarchical clustering of raw words. Then, multi-pass encoding and decoding are conducted on the multi-level vocabularies. Since VPN is able to leverage rich encoding and decoding information with multi-level vocabularies, it has the potential to generate better responses. Experiments on English Twitter and Chinese Weibo datasets demonstrate that VPN remarkably outperforms strong baselines.

pdf abs
AdaNSP: Uncertainty-driven Adaptive Decoding in Neural Semantic Parsing
Xiang Zhang | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Neural semantic parsers utilize the encoder-decoder framework to learn an end-to-end model for semantic parsing that transduces a natural language sentence to the formal semantic representation. To keep the model aware of the underlying grammar in target sequences, many constrained decoders were devised in a multi-stage paradigm, which decode to the sketches or abstract syntax trees first, and then decode to target semantic tokens. We instead to propose an adaptive decoding method to avoid such intermediate representations. The decoder is guided by model uncertainty and automatically uses deeper computations when necessary. Thus it can predict tokens adaptively. Our model outperforms the state-of-the-art neural models and does not need any expertise like predefined grammar or sketches in the meantime.

pdf abs
Incorporating Interlocutor-Aware Context into Response Generation on Multi-Party Chatbots
Cao Liu | Kang Liu | Shizhu He | Zaiqing Nie | Jun Zhao
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Conventional chatbots focus on two-party response generation, which simplifies the real dialogue scene. In this paper, we strive toward a novel task of Response Generation on Multi-Party Chatbot (RGMPC), where the generated responses heavily rely on the interlocutors’ roles (e.g., speaker and addressee) and their utterances. Unfortunately, complex interactions among the interlocutors’ roles make it challenging to precisely capture conversational contexts and interlocutors’ information. Facing this challenge, we present a response generation model which incorporates Interlocutor-aware Contexts into Recurrent Encoder-Decoder frameworks (ICRED) for RGMPC. Specifically, we employ interactive representations to capture dialogue contexts for different interlocutors. Moreover, we leverage an addressee memory to enhance contextual interlocutor information for the target addressee. Finally, we construct a corpus for RGMPC based on an existing open-access dataset. Automatic and manual evaluations demonstrate that the ICRED remarkably outperforms strong baselines.

pdf abs
Learning the Extraction Order of Multiple Relational Facts in a Sentence with Reinforcement Learning
Xiangrong Zeng | Shizhu He | Daojian Zeng | Kang Liu | Shengping Liu | Jun Zhao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

The multiple relation extraction task tries to extract all relational facts from a sentence. Existing works didn’t consider the extraction order of relational facts in a sentence. In this paper we argue that the extraction order is important in this task. To take the extraction order into consideration, we apply the reinforcement learning into a sequence-to-sequence model. The proposed model could generate relational facts freely. Widely conducted experiments on two public datasets demonstrate the efficacy of the proposed method.

pdf abs
Generating Questions for Knowledge Bases via Incorporating Diversified Contexts and Answer-Aware Loss
Cao Liu | Kang Liu | Shizhu He | Zaiqing Nie | Jun Zhao
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

We tackle the task of question generation over knowledge bases. Conventional methods for this task neglect two crucial research issues: 1) the given predicate needs to be expressed; 2) the answer to the generated question needs to be definitive. In this paper, we strive toward the above two issues via incorporating diversified contexts and answer-aware loss. Specifically, we propose a neural encoder-decoder model with multi-level copy mechanisms to generate such questions. Furthermore, the answer aware loss is introduced to make generated questions corresponding to more definitive answers. Experiments demonstrate that our model achieves state-of-the-art performance. Meanwhile, such generated question is able to express the given predicate and correspond to a definitive answer.

2018

pdf abs
Extracting Relational Facts by an End-to-End Neural Model with Copy Mechanism
Xiangrong Zeng | Daojian Zeng | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The relational facts in sentences are often complicated. Different relational triplets may have overlaps in a sentence. We divided the sentences into three types according to triplet overlap degree, including Normal, EntityPairOverlap and SingleEntiyOverlap. Existing methods mainly focus on Normal class and fail to extract relational triplets precisely. In this paper, we propose an end-to-end model based on sequence-to-sequence learning with copy mechanism, which can jointly extract relational facts from sentences of any of these classes. We adopt two different strategies in decoding process: employing only one united decoder or applying multiple separated decoders. We test our models in two public datasets and our model outperform the baseline method significantly.

pdf abs
Pattern-revising Enhanced Simple Question Answering over Knowledge Bases
Yanchao Hao | Hao Liu | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 27th International Conference on Computational Linguistics

Question Answering over Knowledge Bases (KB-QA), which automatically answer natural language questions based on the facts contained by a knowledge base, is one of the most important natural language processing (NLP) tasks. Simple questions constitute a large part of questions queried on the web, still being a challenge to QA systems. In this work, we propose to conduct pattern extraction and entity linking first, and put forward pattern revising procedure to mitigate the error propagation problem. In order to learn to rank candidate subject-predicate pairs to enable the relevant facts retrieval given a question, we propose to do joint fact selection enhanced by relation detection. Multi-level encodings and multi-dimension information are leveraged to strengthen the whole procedure. The experimental results demonstrate that our approach sets a new record in this task, outperforming the current state-of-the-art by an absolute large margin.

2017

The IJCNLP-2017 Multi-choice Question Answering(MCQA) task aims at exploring the performance of current Question Answering(QA) techniques via the realworld complex questions collected from Chinese Senior High School Entrance Examination papers and CK12 website1. The questions are all 4-way multi-choice questions writing in Chinese and English respectively that cover a wide range of subjects, e.g. Biology, History, Life Science and etc. And, all questions are restrained within the elementary and middle school level. During the whole procedure of this task, 7 teams submitted 323 runs in total. This paper describes the collected data, the format and size of these questions, formal run statistics and results, overview and performance statistics of different methods

pdf abs
Which is the Effective Way for Gaokao: Information Retrieval or Neural Networks?
Shangmin Guo | Xiangrong Zeng | Shizhu He | Kang Liu | Jun Zhao
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

As one of the most important test of China, Gaokao is designed to be difficult enough to distinguish the excellent high school students. In this work, we detailed the Gaokao History Multiple Choice Questions(GKHMC) and proposed two different approaches to address them using various resources. One approach is based on entity search technique (IR approach), the other is based on text entailment approach where we specifically employ deep neural networks(NN approach). The result of experiment on our collected real Gaokao questions showed that they are good at different categories of questions, that is IR approach performs much better at entity questions(EQs) while NN approach shows its advantage on sentence questions(SQs). We achieve state-of-the-art performance and show that it’s indispensable to apply hybrid method when participating in the real-world tests.

pdf abs
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning
Shizhu He | Cao Liu | Kang Liu | Jun Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Generating answer with natural language sentence is very important in real-world question answering systems, which needs to obtain a right answer as well as a coherent natural response. In this paper, we propose an end-to-end question answering system called COREQA in sequence-to-sequence learning, which incorporates copying and retrieving mechanisms to generate natural answers within an encoder-decoder framework. Specifically, in COREQA, the semantic units (words, phrases and entities) in a natural answer are dynamically predicted from the vocabulary, copied from the given question and/or retrieved from the corresponding knowledge base jointly. Our empirical study on both synthetic and real-world datasets demonstrates the efficiency of COREQA, which is able to generate correct, coherent and natural answers for knowledge inquired questions.

pdf abs
An End-to-End Model for Question Answering over Knowledge Base with Cross-Attention Combining Global Knowledge
Yanchao Hao | Yuanzhe Zhang | Kang Liu | Shizhu He | Zhanyi Liu | Hua Wu | Jun Zhao
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

With the rapid growth of knowledge bases (KBs) on the web, how to take full advantage of them becomes increasingly important. Question answering over knowledge base (KB-QA) is one of the promising approaches to access the substantial knowledge. Meanwhile, as the neural network-based (NN-based) methods develop, NN-based KB-QA has already achieved impressive results. However, previous work did not put more emphasis on question representation, and the question is converted into a fixed vector regardless of its candidate answers. This simple representation strategy is not easy to express the proper information in the question. Hence, we present an end-to-end neural network model to represent the questions and their corresponding scores dynamically according to the various candidate answer aspects via cross-attention mechanism. In addition, we leverage the global knowledge inside the underlying KB, aiming at integrating the rich KB information into the representation of the answers. As a result, it could alleviates the out-of-vocabulary (OOV) problem, which helps the cross-attention model to represent the question more precisely. The experimental results on WebQuestions demonstrate the effectiveness of the proposed approach.

Shizhu He

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

Co-authors

Venues