Zheng Hu

2026

Beyond Noise: Characterizing Creative Potential in Unverifiable LLM Hallucinations
Yu Yan | Chunhong Zhang | Haiyu Zhao | Ziyang Zeng | Zihao Liu | Yongkang Wu | Jianzhou Diao | \begin{CJK*}{UTF8}{gbsn}陈奕杰\end{CJK*} | Shujie Wang | Zheng Hu
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

In knowledge-intensive creative tasks, Large Language Models (LLMs) often generate outputs that extend beyond established knowledge, making direct verification against current evidence impractical. Unlike factual hallucinations checked against ground truth, such outputs arise naturally in creative generation, where extending beyond current knowledge is often the goal. Yet prior work debates whether hallucination should be suppressed or embraced without empirically analyzing this unverifiable subclass. On the ideation evaluation side, existing work focuses on individual outputs without characterizing the unverifiable space as a whole. To address this gap, we propose a novelty-verifiability characterization that distinguishes Creative Synthesis (Region A) from Groundless Fabrication (Region B), and study it through a conceptual creation task where LLMs synthesize novel scientific concepts. Through 32,400 generations across three technical domains and 1,080 human judgments, we find that Region A is non-negligible (4.7%) and robust, persisting across generation strategies, models, domains, and embedding choices. A retrospective recovery experiment further shows that LLMs can approximate post-cutoff scientific concepts in controlled combinatorial settings. Our findings suggest that the unverifiable space is not uniformly noise but exhibits empirically distinguishable internal structure, providing an empirical basis for more selective hallucination governance.[<https://github.com/YuLab1/llm-concept-creation>]

2025

pdf bib abs

Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction
Liping Liu | Chunhong Zhang | Likang Wu | Chuang Zhao | Zheng Hu | Ming He | Jianping Fan
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

Self-reflection for Large LanguageModels (LLMs) has gained significant attention. Existing approaches involve models iterating and improving their previous responses based on LLMs’ internal reflection ability or external feedback. However, recent research has raised doubts about whether intrinsic self-correction without external feedback may even degrade performance. Based on our empirical evidence, we find that current static reflection methods may lead to redundant, drift, and stubborn issues. To mitigate this, we introduce **I**nstruct-**o**f-**R**eflec**t**ion (**IoRT**), a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of LLMs. Specifically, we propose the instructor driven by the meta-thoughts and self-consistency classifier, generates various instructions, including refresh, stop, and select, to guide the next reflection iteration. Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks, highlighting its efficacy and applicability. Our code is available at https://github.com/llp635/IoRT.

pdf bib abs

StructuThink: Reasoning with Task Transition Knowledge for Autonomous LLM-Based Agents
Haiyu Zhao | Zhenyu Guo | Chunhong Zhang | Ziyu Zhou | Zheng Hu
Findings of the Association for Computational Linguistics: EMNLP 2025

Decision-making tasks have highlighted fundamental challenges in grounding decisions within real-world contexts. Traditional decision knowledge utilization methods often struggle to effectively integrate structured decision constraints, limiting their ability to decompose high-level tasks, maintain logical consistency, and adapt to dynamic environments. To bridge this gap, we introduce StructuThink, a knowledge-structured reasoning framework that enhances LLM-based agents with explicit decision constraints. Specifically, we propose the Task Transition Knowledge Graph (TTKG) that learning decision knowledge in embodied scenarios. Leveraging this knowledge, we propose the StructuThink framework, comprising a subtask chain constructor for grounding natural language instructions and a constraint-based executor for adaptive and consistent decision-making. We validate StructuThink across multiple benchmarks, including ALFWorld and WebShop, where it achieves higher task success rates (improving by up to 7%) and more efficient action sequences (requiring up to 15% fewer steps) than baseline methods. Our approach enables LLMs to more effectively ground decision-making in domain-specific scenarios, enhancing both interpretability and reliability, thus paving the way for more reliable and adaptable decision-making systems.

pdf bib abs

Open Information Extraction (OpenIE) aims to extract structured information in the form of triples from unstructured text, serving as a foundation for various downstream NLP tasks. Despite the success of neural OpenIE models, their dependence on large-scale annotated datasets poses a challenge, particularly in low-resource settings. In this paper, we introduce a novel approach to address the low-resource OpenIE task through two key innovations: (1) we improve the quality of training data by curating small-scale, high-quality datasets annotated by a large language model (GPT-3.5), leveraging both OpenIE principles and few-shot examples to form LSOIE-g principles and LSOIE-g examples; (2) we propose CycleOIE, a training framework that maximizes data efficiency through a cycle-consistency mechanism, enabling the model to learn effectively from minimal data. Experimental results show that CycleOIE, when trained on only 2k+ instances, achieves comparable results to models trained on over 90k instances. Our contributions are further validated through extensive experiments, demonstrating the superior performance of CycleOIE and our curated LSOIE-g datasets in low-resource OpenIE as well as revealing the internal mechanisms of CycleOIE.

pdf bib abs

All That Glitters is Not Gold: Improving Robust Retrieval-Augmented Language Models with Fact-Centric Preference Alignment
Jia Hao | Chunhong Zhang | Jiarun Liu | Haiyu Zhao | Zhiqiang Zhan | Zheng Hu
Findings of the Association for Computational Linguistics: ACL 2025

Retrieval-augmented language model (RALM) relies on retrieved external knowledge to generate responses, resulting in vulnerability in the face of retrieval results with noisy documents. Previous works integrate additional filters or finetune Large Language Models (LLMs) to learn adaptive retrieval to reduce the performance damage of noisy documents. However, prior noise filtering may lead to the loss of crucial information, and these methods do not focus on distracting documents with high semantic relevance, which is the most challenging problem. In this study, we propose a training method for fact-centric preference alignment (FPA) to improve the ability of LLMs to directly extract useful information from noisy retrieval results without prior filtering. Our method performs positive document mining based on factual consistency and uses LLMs self-generated synthetic data as training data without manual annotation. We evaluate our FPA on four question answering benchmarks, and the experimental results demonstrate that our method achieves significant improvement with a small scale of training data.

2021

pdf bib abs

More than Text: Multi-modal Chinese Word Segmentation
Dong Zhang | Zheng Hu | Shoushan Li | Hanqian Wu | Qiaoming Zhu | Guodong Zhou
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Chinese word segmentation (CWS) is undoubtedly an important basic task in natural language processing. Previous works only focus on the textual modality, but there are often audio and video utterances (such as news broadcast and face-to-face dialogues), where textual, acoustic and visual modalities normally exist. To this end, we attempt to combine the multi-modality (mainly the converted text and actual voice information) to perform CWS. In this paper, we annotate a new dataset for CWS containing text and audio. Moreover, we propose a time-dependent multi-modal interactive model based on Transformer framework to integrate multi-modal information for word sequence labeling. The experimental results on three different training sets show the effectiveness of our approach with fusing text and audio.

Co-authors

Venues

Fix author