Xiaobin Wang
2026
Towards General Agentic Intelligence via Environment Scaling
Runnan Fang | Shihao Cai | Baixuan Li | Jialong Wu | Guangyu Li | Wenbiao Yin | Xinyu Wang | Xiaobin Wang | Liangcai Su | Zhen Zhang | Shibin Wu | Zhengwei Tao | Yong Jiang | Pengjun Xie | Ningyu Zhang | Fei Huang | Wentao Zhang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Runnan Fang | Shihao Cai | Baixuan Li | Jialong Wu | Guangyu Li | Wenbiao Yin | Xinyu Wang | Xiaobin Wang | Liangcai Su | Zhen Zhang | Shibin Wu | Zhengwei Tao | Yong Jiang | Pengjun Xie | Ningyu Zhang | Fei Huang | Wentao Zhang | Jingren Zhou
Findings of the Association for Computational Linguistics: ACL 2026
Advanced agentic intelligence is a prerequisite for deploying Large Language Models in practical, real-world applications. Diverse real-world APIs demand precise, robust function-calling intelligence, which needs agents to develop these capabilities through interaction in varied environments. The breadth of function-calling competence is closely tied to the diversity of environments in which agents are trained. In this work, we scale up environments as a step towards advancing general agentic intelligence. This gives rise to two central challenges: (i) how to scale environments in a principled manner, and (ii) how to effectively train agentic capabilities from experiences derived through interactions with these environments. To address these, we design a scalable framework that automatically constructs heterogeneous environments that are fully simulated, broadening the space of function-calling scenarios. We further adapt a two-phase agent fine-tuning strategy: first endowing agents with fundamental agentic capabilities, then specializing them for domain-specific contexts. Extensive experiments on agentic benchmarks, -bench, -Bench, and ACEBench, demonstrate that our trained model, AgentScaler, significantly enhances the models’ function-calling capability.
Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information Seeking
Songbo Hu | Yinhong Liu | Ej Zhou | Evgeniia Razumovskaia | Xiaobin Wang | Alexander Fraser | Ivan Vuli\'c | Anna Korhonen
Findings of the Association for Computational Linguistics: ACL 2026
Songbo Hu | Yinhong Liu | Ej Zhou | Evgeniia Razumovskaia | Xiaobin Wang | Alexander Fraser | Ivan Vuli\'c | Anna Korhonen
Findings of the Association for Computational Linguistics: ACL 2026
Creating spoken dialogue datasets is methodologically challenging, and these challenges are amplified when the goal is to build multilingual, multi-parallel datasets at scale. This work introduces HEALTHDIAL, a large-scale, multilingual, and multi-parallel dataset for developing and evaluating retrieval-augmented generation (RAG)–based spoken dialogue systems. The dataset comprises 6,000 information-seeking dialogues (1,500 per language) grounded in trusted content from the World Health Organization (WHO) and 163 hours of user speech recorded from native speakers of diverse dialects across four official WHO languages: Arabic, Chinese, English, and Spanish. Each speaker is annotated with demographic (e.g., gender, age) and sociolinguistic (e.g., primary language, region of origin) variables. We report benchmark results across key dialogue tasks, which reveal consistent performance disparities across languages, even among high-resource ones. To support future research, we release the dataset, a prototype system, and a toolkit for data collection and system evaluation.
Memp: Exploring Agent Procedural Memory
Runnan Fang | Yuan Liang | Xiaobin Wang | Jialong Wu | Shuofei Qiao | Pengjun Xie | Fei Huang | Huajun Chen | Ningyu Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Runnan Fang | Yuan Liang | Xiaobin Wang | Jialong Wu | Shuofei Qiao | Pengjun Xie | Fei Huang | Huajun Chen | Ningyu Zhang
Findings of the Association for Computational Linguistics: ACL 2026
Large Language Models (LLMs) based agents excel at diverse tasks, yet they suffer from brittle procedural memory that is manually engineered or entangled in static parameters. In this work, we investigate strategies to endow agents with a learnable, updatable, and lifelong procedural memory. We propose a procedural-memory repository that distills past agent trajectories into both fine-grained, step-by-step instructions and higher-level, script-like abstractions. Coupled with a dynamic regimen that continuously updates, corrects, and deprecates its contents, this repository evolves in lockstep with new experience. Empirical evaluation on TravelPlanner and Alfworld shows that as the memory repository is refined, agents achieve steadily higher success rates and greater efficiency on analogous tasks. Moreover, procedural memory built from a stronger model retains its value: migrating the procedural memory to a weaker model yields substantial performance gains.
U-Fold: Dynamic Intent-Aware Context Folding for User-Centric Agents
Jin Su | Runnan Fang | Yeqiu Li | Xiaobin Wang | Shihao Cai | Pengjun Xie | Ningyu Zhang | Fajie Yuan
Findings of the Association for Computational Linguistics: ACL 2026
Jin Su | Runnan Fang | Yeqiu Li | Xiaobin Wang | Shihao Cai | Pengjun Xie | Ningyu Zhang | Fajie Yuan
Findings of the Association for Computational Linguistics: ACL 2026
Large language model (LLM)-based agents have been successfully deployed in many tool-augmented settings, but their scalability is fundamentally constrained by context length. Existing context-folding methods mitigate this issue by summarizing past interactions, yet they are typically designed for single-query or single-intent scenarios. In more realistic user-centric dialogues, we identify two major failure modes: (i) they irreversibly discard fine-grained constraints and intermediate facts that are crucial for later decisions, and (ii) their summaries fail to track evolving user intent, leading to omissions and erroneous actions. To address these limitations, we propose U-Fold, a dynamic context-folding framework tailored to user-centric tasks. U-Fold retains the full user–agent dialogue and tool-call history but, at each turn, uses two core components to produce an intent-aware, evolving dialogue summary and a compact, task-relevant tool log. Extensive experiments on 𝜏-bench, 𝜏2-bench, VitaBench, and harder context-inflated settings show that U-Fold consistently outperforms ReAct (achieving a 71.4% win rate in long-context settings) and prior folding baselines (with improvements of up to 27.0%), particularly on long, noisy, multi-turn tasks. Our study demonstrates that U-Fold is a promising step toward transferring context-management techniques from single-query benchmarks to realistic user-centric applications.
2025
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement
Runnan Fang | Xiaobin Wang | Yuan Liang | Shuofei Qiao | Jialong Wu | Zekun Xi | Ningyu Zhang | Yong Jiang | Pengjun Xie | Fei Huang | Huajun Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Runnan Fang | Xiaobin Wang | Yuan Liang | Shuofei Qiao | Jialong Wu | Zekun Xi | Ningyu Zhang | Yong Jiang | Pengjun Xie | Fei Huang | Huajun Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments.
ProductAgent: Benchmarking Conversational Product Search Agent with Asking Clarification Questions
Jingheng Ye | Yong Jiang | Xiaobin Wang | Yinghui Li | Yangning Li | Pengjun Xie | Fei Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Jingheng Ye | Yong Jiang | Xiaobin Wang | Yinghui Li | Yangning Li | Pengjun Xie | Fei Huang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Online shoppers often initiate their journey with only a vague idea of what they need, forcing them to iterate over search results until they eventually discover a suitable product. We formulate this scenario as product demand clarification: starting from an ambiguous query, an agent must iteratively ask clarifying questions, progressively refine the user’s intent, and retrieve increasingly relevant items. To tackle this challenge, we present **ProductAgent**, a fully autonomous conversational information-seeking agent that couples large language models with a set of domain-specific tools. ProductAgent maintains a structured memory of the dialogue, summarizes candidate products into concise feature statistics, generates strategic clarification questions, and performs retrieval over hybrid (symbolic + dense) indices in a closed decision loop. To measure real–world effectiveness, we further introduce **PROCLARE**, a PROduct CLArifying REtrieval benchmark that pairs ProductAgent with an LLM-driven user simulator, thereby enabling large-scale and reproducible evaluation without human annotation. On 2,000 automatically generated sessions, retrieval metrics improve monotonically with the number of turns, validating that ProductAgent captures and refines user intent through dialogue.
Agentic Knowledgeable Self-awareness
Shuofei Qiao | Zhisong Qiu | Baochang Ren | Xiaobin Wang | Xiangyuan Ru | Ningyu Zhang | Xiang Chen | Yong Jiang | Pengjun Xie | Fei Huang | Huajun Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Shuofei Qiao | Zhisong Qiu | Baochang Ren | Xiaobin Wang | Xiangyuan Ru | Ningyu Zhang | Xiang Chen | Yong Jiang | Pengjun Xie | Fei Huang | Huajun Chen
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large Language Models (LLMs) have achieved considerable performance across various agentic planning tasks. However, traditional approaches adopt a “flood irrigation” methodology that indiscriminately injects gold trajectories, external feedback, and domain knowledge into agent models. This practice overlooks the fundamental human cognitive principle of self-awareness - the ability to dynamically assess situational demands and strategically employ resources during decision-making. We propose Agentic Knowledgeable Self-awareness to address this gap, a novel paradigm enabling LLM-based agents to autonomously regulate knowledge utilization. Specifically, we propose KnowSelf, a data-centric approach that applies agents with knowledgeable self-awareness like humans. Concretely, we devise a heuristic situation judgement criterion to mark special tokens on the agent’s self-explored trajectories for collecting training data. Through a two-stage training process, the agent model can switch between different situations by generating specific special tokens, achieving optimal planning effects with minimal costs. Our experiments demonstrate that can outperform various strong baselines on different tasks and models with minimal use of external knowledge.
2024
DIALIGHT: Lightweight Multilingual Development and Evaluation of Task-Oriented Dialogue Systems with Large Language Models
Songbo Hu | Xiaobin Wang | Moy Yuan | Anna Korhonen | Ivan Vulić
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)
Songbo Hu | Xiaobin Wang | Moy Yuan | Anna Korhonen | Ivan Vulić
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations)
We present DIALIGHT, a toolkit for developing and evaluating multilingual Task-Oriented Dialogue (ToD) systems which facilitates systematic evaluations and comparisons between ToD systems using fine-tuning of Pretrained Language Models (PLMs) and those utilising the zero-shot and in-context learning capabilities of Large Language Models (LLMs). In addition to automatic evaluation, this toolkit features (i) a secure, user-friendly web interface for fine-grained human evaluation at both local utterance level and global dialogue level, and (ii) a microservice-based backend, improving efficiency and scalability. Our evaluations reveal that while PLM fine-tuning leads to higher accuracy and coherence, LLM-based systems excel in producing diverse and likeable responses. However, we also identify significant challenges of LLMs in adherence to task-specific instructions and generating outputs in multiple languages, highlighting areas for future research. We hope this open-sourced toolkit will serve as a valuable resource for researchers aiming to develop and properly evaluate multilingual ToD systems and will lower, currently still high, entry barriers in the field.
Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process
Peng Wang | Xiaobin Wang | Chao Lou | Shengyu Mao | Pengjun Xie | Yong Jiang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Peng Wang | Xiaobin Wang | Chao Lou | Shengyu Mao | Pengjun Xie | Yong Jiang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances. Despite the remarkable ICL capabilities demonstrated by Large Language Models (LLMs), existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios. To refine this approach, we focus primarily on an innovative selective annotation mechanism, which precedes the standard demonstration retrieval. We introduce the Language Model-based Determinant Point Process (LM-DPP) that simultaneously considers the uncertainty and diversity of unlabeled instances for optimal selection. Consequently, this yields a subset for annotation that strikes a trade-off between the two factors. We apply LM-DPP to various language models, including GPT-J, LlaMA, and GPT-3. Experimental results on 9 NLU and 2 Generation datasets demonstrate that LM-DPP can effectively select canonical examples. Further analysis reveals that LLMs benefit most significantly from subsets that are both low uncertainty and high diversity.
2023
MANNER: A Variational Memory-Augmented Model for Cross Domain Few-Shot Named Entity Recognition
Jinyuan Fang | Xiaobin Wang | Zaiqiao Meng | Pengjun Xie | Fei Huang | Yong Jiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Jinyuan Fang | Xiaobin Wang | Zaiqiao Meng | Pengjun Xie | Fei Huang | Yong Jiang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper focuses on the task of cross domain few-shot named entity recognition (NER), which aims to adapt the knowledge learned from source domain to recognize named entities in target domain with only a few labeled examples. To address this challenging task, we propose MANNER, a variational memory-augmented few-shot NER model. Specifically, MANNER uses a memory module to store information from the source domain and then retrieve relevant information from the memory to augment few-shot task in the target domain. In order to effectively utilize the information from memory, MANNER uses optimal transport to retrieve and process information from memory, which can explicitly adapt the retrieved information from source domain to target domain and improve the performance in the cross domain few-shot setting. We conduct experiments on English and Chinese cross domain few-shot NER datasets, and the experimental results demonstrate that MANNER can achieve superior performance.
Exploring Lottery Prompts for Pre-trained Language Models
Yulin Chen | Ning Ding | Xiaobin Wang | Shengding Hu | Haitao Zheng | Zhiyuan Liu | Pengjun Xie
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yulin Chen | Ning Ding | Xiaobin Wang | Shengding Hu | Haitao Zheng | Zhiyuan Liu | Pengjun Xie
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on model adaptation, necessitating more efficient alternatives to conventional fine-tuning. Given the advantage of prompting in the zero-shot setting and the observed performance fluctuation among different prompts, we explore the instance-level prompt and their generalizability.By searching through the prompt space, we first validate the assumption that for every instance, there is almost always a lottery prompt that induces the correct prediction from the PLM, and such prompt can be obtained at a low cost thanks to the inherent ability of PLMs.Meanwhile, it is shown that some strong lottery prompts have high performance over the whole training set, and they are equipped with distinguishable linguistic features. Lastly, we attempt to generalize the searched strong lottery prompts to unseen data with prompt ensembling method. Experiments are conducted on various types of NLP classification tasks and demonstrate that the proposed method can achieve comparable results with other gradient-free and optimization-free baselines.
Recall, Expand, and Multi-Candidate Cross-Encode: Fast and Accurate Ultra-Fine Entity Typing
Chengyue Jiang | Wenyang Hui | Yong Jiang | Xiaobin Wang | Pengjun Xie | Kewei Tu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Chengyue Jiang | Wenyang Hui | Yong Jiang | Xiaobin Wang | Pengjun Xie | Kewei Tu
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Ultra-fine entity typing (UFET) predicts extremely free-formed types (e.g., president, politician) of a given entity mention (e.g., Joe Biden) in context. State-of-the-art (SOTA) methods use the cross-encoder (CE) based architecture. CE concatenates a mention (and its context) with each type and feeds the pair into a pretrained language model (PLM) to score their relevance. It brings deeper interaction between the mention and the type to reach better performance but has to perform N (the type set size) forward passes to infer all the types of a single mention. CE is therefore very slow in inference when the type set is large (e.g., N=10k for UFET). % Cross-encoder also ignores the correlation between different types.To this end, we propose to perform entity typing in a recall-expand-filter manner. The recall and expansion stages prune the large type set and generate K (typically much smaller than N) most relevant type candidates for each mention. At the filter stage, we use a novel model called {pasted macro ‘NAME’} to concurrently encode and score all these K candidates in only one forward pass to obtain the final type prediction. We investigate different model options for each stage and conduct extensive experiments to compare each option, experiments show that our method reaches SOTA performance on UFET and is thousands of times faster than the CE-based architecture. We also found our method is very effective in fine-grained (130 types) and coarse-grained (9 types) entity typing. Our code is available at {pasted macro ‘CODE’}.
Few-shot Classification with Hypersphere Modeling of Prototypes
Ning Ding | Yulin Chen | Ganqu Cui | Xiaobin Wang | Haitao Zheng | Zhiyuan Liu | Pengjun Xie
Findings of the Association for Computational Linguistics: ACL 2023
Ning Ding | Yulin Chen | Ganqu Cui | Xiaobin Wang | Haitao Zheng | Zhiyuan Liu | Pengjun Xie
Findings of the Association for Computational Linguistics: ACL 2023
Metric-based meta-learning is one of the de facto standards in few-shot learning. It composes of representation learning and metrics calculation designs. Previous works construct class representations in different ways, varying from mean output embedding to covariance and distributions. However, using embeddings in space lacks expressivity and cannot capture class information robustly, while statistical complex modeling poses difficulty to metric designs. In this work, we use tensor fields (“areas”) to model classes from the geometrical perspective for few-shot learning. We present a simple and effective method, dubbed as hypersphere prototypes (HyperProto), where class information is represented by hyperspheres with dynamic sizes with two sets of learnable parameters: the hypersphere’s center and the radius. Extending from points to areas, hyperspheres are much more expressive than embeddings. Moreover, it is more convenient to perform metric-based classification with hypersphere prototypes than statistical modeling, as we only need to calculate the distance from a data point to the surface of the hypersphere. Following this idea, we also develop two variants of prototypes under other measurements. Extensive experiments and analysis on few-shot NLP tasks and comparison with 20+ competitive baselines demonstrate the effectiveness of our approach.
2022
DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition
Xinyu Wang | Yongliang Shen | Jiong Cai | Tao Wang | Xiaobin Wang | Pengjun Xie | Fei Huang | Weiming Lu | Yueting Zhuang | Kewei Tu | Wei Lu | Yong Jiang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
Xinyu Wang | Yongliang Shen | Jiong Cai | Tao Wang | Xiaobin Wang | Pengjun Xie | Fei Huang | Weiming Lu | Yueting Zhuang | Kewei Tu | Wei Lu | Yong Jiang
Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)
The MultiCoNER shared task aims at detecting semantically ambiguous and complex named entities in short and low-context settings for multiple languages. The lack of contexts makes the recognition of ambiguous named entities challenging. To alleviate this issue, our team DAMO-NLP proposes a knowledge-based system, where we build a multilingual knowledge base based on Wikipedia to provide related context information to the named entity recognition (NER) model. Given an input sentence, our system effectively retrieves related contexts from the knowledge base. The original input sentences are then augmented with such context information, allowing significantly better contextualized token representations to be captured. Our system wins 10 out of 13 tracks in the MultiCoNER shared task.
Domain-Specific NER via Retrieving Correlated Samples
Xin Zhang | Yong Jiang | Xiaobin Wang | Xuming Hu | Yueheng Sun | Pengjun Xie | Meishan Zhang
Proceedings of the 29th International Conference on Computational Linguistics
Xin Zhang | Yong Jiang | Xiaobin Wang | Xuming Hu | Yueheng Sun | Pengjun Xie | Meishan Zhang
Proceedings of the 29th International Conference on Computational Linguistics
Successful Machine Learning based Named Entity Recognition models could fail on texts from some special domains, for instance, Chinese addresses and e-commerce titles, where requires adequate background knowledge. Such texts are also difficult for human annotators. In fact, we can obtain some potentially helpful information from correlated texts, which have some common entities, to help the text understanding. Then, one can easily reason out the correct answer by referencing correlated samples. In this paper, we suggest enhancing NER models with correlated samples. We draw correlated samples by the sparse BM25 retriever from large-scale in-domain unlabeled data. To explicitly simulate the human reasoning process, we perform a training-free entity type calibrating by majority voting. To capture correlation features in the training stage, we suggest to model correlated samples by the transformer-based multi-instance cross-encoder. Empirical results on datasets of the above two domains show the efficacy of our methods.
Parallel Instance Query Network for Named Entity Recognition
Yongliang Shen | Xiaobin Wang | Zeqi Tan | Guangwei Xu | Pengjun Xie | Fei Huang | Weiming Lu | Yueting Zhuang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Yongliang Shen | Xiaobin Wang | Zeqi Tan | Guangwei Xu | Pengjun Xie | Fei Huang | Weiming Lu | Yueting Zhuang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Named entity recognition (NER) is a fundamental task in natural language processing. Recent works treat named entity recognition as a reading comprehension task, constructing type-specific queries manually to extract entities. This paradigm suffers from three issues. First, type-specific queries can only extract one type of entities per inference, which is inefficient. Second, the extraction for different types of entities is isolated, ignoring the dependencies between them. Third, query construction relies on external knowledge and is difficult to apply to realistic scenarios with hundreds of entity types. To deal with them, we propose Parallel Instance Query Network (PIQN), which sets up global and learnable instance queries to extract entities from a sentence in a parallel manner. Each instance query predicts one entity, and by feeding all instance queries simultaneously, we can query all entities in parallel. Instead of being constructed from external knowledge, instance queries can learn their different query semantics during training. For training the model, we treat label assignment as a one-to-many Linear Assignment Problem (LAP) and dynamically assign gold entities to instance queries with minimal assignment cost. Experiments on both nested and flat NER datasets demonstrate that our proposed method outperforms previous state-of-the-art models.
Identifying Chinese Opinion Expressions with Extremely-Noisy Crowdsourcing Annotations
Xin Zhang | Guangwei Xu | Yueheng Sun | Meishan Zhang | Xiaobin Wang | Min Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xin Zhang | Guangwei Xu | Yueheng Sun | Meishan Zhang | Xiaobin Wang | Min Zhang
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent works of opinion expression identification (OEI) rely heavily on the quality and scale of the manually-constructed training corpus, which could be extremely difficult to satisfy. Crowdsourcing is one practical solution for this problem, aiming to create a large-scale but quality-unguaranteed corpus. In this work, we investigate Chinese OEI with extremely-noisy crowdsourcing annotations, constructing a dataset at a very low cost. Following Zhang el al. (2021), we train the annotator-adapter model by regarding all annotations as gold-standard in terms of crowd annotators, and test the model by using a synthetic expert, which is a mixture of all annotators. As this annotator-mixture for testing is never modeled explicitly in the training phase, we propose to generate synthetic training samples by a pertinent mixup strategy to make the training and testing highly consistent. The simulation experiments on our constructed dataset show that crowdsourcing is highly promising for OEI, and our proposed annotator-mixup can further enhance the crowdsourcing modeling.
Prompt-learning for Fine-grained Entity Typing
Ning Ding | Yulin Chen | Xu Han | Guangwei Xu | Xiaobin Wang | Pengjun Xie | Haitao Zheng | Zhiyuan Liu | Juanzi Li | Hong-Gee Kim
Findings of the Association for Computational Linguistics: EMNLP 2022
Ning Ding | Yulin Chen | Xu Han | Guangwei Xu | Xiaobin Wang | Pengjun Xie | Haitao Zheng | Zhiyuan Liu | Juanzi Li | Hong-Gee Kim
Findings of the Association for Computational Linguistics: EMNLP 2022
As an effective approach to adapting pre-trained language models (PLMs) for specific tasks, prompt-learning has recently attracted much attention from researchers. By using cloze-style language prompts to stimulate the versatile knowledge of PLMs, prompt-learning can achieve promising results on a series of NLP tasks, such as natural language inference, sentiment classification, and knowledge probing. In this work, we investigate the application of prompt-learning on fine-grained entity typing in fully supervised, few-shot, and zero-shot scenarios. We first develop a simple and effective prompt-learning pipeline by constructing entity-oriented verbalizers and templates and conducting masked language modeling. Further, to tackle the zero-shot regime, we propose a self-supervised strategy that carries out distribution-level optimization in prompt-learning to automatically summarize the information of entity types. Extensive experiments on four fine-grained entity typing benchmarks under fully supervised, few-shot, and zero-shot settings show the effectiveness of the prompt-learning paradigm and further make a powerful alternative to vanilla fine-tuning.
2021
Few-NERD: A Few-shot Named Entity Recognition Dataset
Ning Ding | Guangwei Xu | Yulin Chen | Xiaobin Wang | Xu Han | Pengjun Xie | Haitao Zheng | Zhiyuan Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Ning Ding | Guangwei Xu | Yulin Chen | Xiaobin Wang | Xu Han | Pengjun Xie | Haitao Zheng | Zhiyuan Liu
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Recently, considerable literature has grown up around the theme of few-shot named entity recognition (NER), but little published benchmark data specifically focused on the practical and challenging task. Current approaches collect existing supervised NER datasets and re-organize them to the few-shot setting for empirical study. These strategies conventionally aim to recognize coarse-grained entity types with few examples, while in practice, most unseen entity types are fine-grained. In this paper, we present Few-NERD, a large-scale human-annotated few-shot NER dataset with a hierarchy of 8 coarse-grained and 66 fine-grained entity types. Few-NERD consists of 188,238 sentences from Wikipedia, 4,601,160 words are included and each is annotated as context or a part of the two-level entity type. To the best of our knowledge, this is the first few-shot NER dataset and the largest human-crafted NER dataset. We construct benchmark tasks with different emphases to comprehensively assess the generalization capability of models. Extensive empirical results and analysis show that Few-NERD is challenging and the problem requires further research. The Few-NERD dataset and the baselines will be publicly available to facilitate the research on this problem.
2020
Coupling Distant Annotation and Adversarial Training for Cross-Domain Chinese Word Segmentation
Ning Ding | Dingkun Long | Guangwei Xu | Muhua Zhu | Pengjun Xie | Xiaobin Wang | Haitao Zheng
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Ning Ding | Dingkun Long | Guangwei Xu | Muhua Zhu | Pengjun Xie | Xiaobin Wang | Haitao Zheng
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Fully supervised neural approaches have achieved significant progress in the task of Chinese word segmentation (CWS). Nevertheless, the performance of supervised models always drops gravely if the domain shifts due to the distribution gap across domains and the out of vocabulary (OOV) problem. In order to simultaneously alleviate the issues, this paper intuitively couples distant annotation and adversarial training for cross-domain CWS. 1) We rethink the essence of “Chinese words” and design an automatic distant annotation mechanism, which does not need any supervision or pre-defined dictionaries on the target domain. The method could effectively explore domain-specific words and distantly annotate the raw texts for the target domain. 2) We further develop a sentence-level adversarial training procedure to perform noise reduction and maximum utilization of the source domain information. Experiments on multiple real-world datasets across various domains show the superiority and robustness of our model, significantly outperforming previous state-of-the-arts cross-domain CWS methods.
2019
DM_NLP at SemEval-2018 Task 12: A Pipeline System for Toponym Resolution
Xiaobin Wang | Chunping Ma | Huafei Zheng | Chu Liu | Pengjun Xie | Linlin Li | Luo Si
Proceedings of the 13th International Workshop on Semantic Evaluation
Xiaobin Wang | Chunping Ma | Huafei Zheng | Chu Liu | Pengjun Xie | Linlin Li | Luo Si
Proceedings of the 13th International Workshop on Semantic Evaluation
This paper describes DM-NLP’s system for toponym resolution task at Semeval 2019. Our system was developed for toponym detection, disambiguation and end-to-end resolution which is a pipeline of the former two. For toponym detection, we utilized the state-of-the-art sequence labeling model, namely, BiLSTM-CRF model as backbone. A lot of strategies are adopted for further improvement, such as pre-training, model ensemble, model averaging and data augment. For toponym disambiguation, we adopted the widely used searching and ranking framework. For ranking, we proposed several effective features for measuring the consistency between the detected toponym and toponyms in GeoNames. Eventually, our system achieved the best performance among all the submitted results in each sub task.
2015
Search
Fix author
Co-authors
- Pengjun Xie 18
- Yong Jiang 9
- Ning Ding 5
- Fei Huang 5
- Guangwei Xu 5
- Ningyu Zhang 5
- Hai-Tao Zheng 5
- Yulin Chen 4
- Runnan Fang 4
- Zhiyuan Liu 4
- Huajun Chen 3
- Fei Huang 3
- Shuofei Qiao 3
- Jialong Wu 3
- Shihao Cai 2
- Xu Han 2
- Songbo Hu 2
- Anna Korhonen 2
- Yuan Liang 2
- Weiming Lu 2
- Yongliang Shen 2
- Yueheng Sun 2
- Kewei Tu 2
- Xinyu Wang 2
- Meishan Zhang 2
- Yueting Zhuang 2
- Jiong Cai 1
- Yadong Chen 1
- Xiang Chen 1
- Ganqu Cui 1
- Jinyuan Fang 1
- Alexander Fraser 1
- Yu Hong (洪宇) 1
- Xuming Hu 1
- Shengding Hu 1
- Wenyang Hui 1
- Heng Ji 1
- Chengyue Jiang 1
- Hong-Gee Kim 1
- Baixuan Li 1
- Guangyu Li 1
- Yinghui Li 1
- Yangning Li 1
- Linlin Li 1
- Juanzi Li 1
- Yeqiu Li 1
- Yinhong Liu 1
- Chu Liu 1
- Dingkun Long 1
- Chao Lou 1
- Wei Lu 1
- Chunping Ma 1
- Shengyu Mao 1
- Zaiqiao Meng 1
- Zhisong Qiu 1
- Evgeniia Razumovskaia 1
- Baochang Ren 1
- Xiangyuan Ru 1
- Luo Si 1
- Liangcai Su 1
- Jin Su 1
- Zeqi Tan 1
- Zhengwei Tao 1
- Ivan Vuli\'c 1
- Ivan Vulić 1
- Tao Wang 1
- Jian Wang 1
- Peng Wang 1
- Shibin Wu 1
- Zekun Xi 1
- Jingheng Ye 1
- Wenbiao Yin 1
- Moy Yuan 1
- Fajie Yuan 1
- Zhen Zhang 1
- Wentao Zhang 1
- Tongtao Zhang 1
- Xin Zhang 1
- Xin Zhang 1
- Min Zhang 1
- Huafei Zheng 1
- Jingren Zhou 1
- Ej Zhou 1
- Muhua Zhu 1