Hao Chen

UC Davis

Other people with similar names: Hao Chen (Tsinghua), Hao Chen (Chinese Academy of Sciences), Hao Chen (South China Normal University), Hao Chen (HKUST), Hao Chen (Nankai), Hao Chen (Hong Kong Polytechnic), Hao Chen, Hao Chen, Hao Chen (Zhejiang), Hao Chen (Dalian, Alibaba)

Unverified author pages with similar names: Hao Chen

2026

pdf bib abs

Content Fuzzing for Escaping Information Cocoons on Digital Social Media
Yifeng He | Ziye Tang | Hao Chen
Findings of the Association for Computational Linguistics: ACL 2026

Information cocoons on social media limit users’ exposure to posts with diverse viewpoints. Modern platforms use stance detection as an important signal in recommendation and ranking pipelines, which can route posts primarily to like-minded audiences and reduce cross-cutting exposure. This restricts the reach of dissenting opinions and hinders constructive discourse. We take the creator’s perspective and investigate how content can be revised to reach beyond existing affinity clusters. We present ContentFuzz, a confidence-guided fuzzing framework that rewrites posts while preserving their human-interpreted intent and induces different machine-inferred stance labels. ContentFuzz aims to route posts beyond their original cocoons. Our method guides a large language model (LLM) to generate meaning-preserving rewrites using confidence feedback from stance detection models. Evaluated on four representative stance detection models across three datasets in two languages, ContentFuzz effectively changes machine-classified stance labels, while maintaining semantic integrity with respect to the original content.

2025

pdf bib abs

FuzzAug: Data Augmentation by Coverage-guided Fuzzing for Neural Test Generation
Yifeng He | Jicheng Wang | Yuyang Rong | Hao Chen
Findings of the Association for Computational Linguistics: EMNLP 2025

Testing is essential to modern software engineering for building reliable software.Given the high costs of manually creating test cases,automated test case generation, particularly methods utilizing large language models,has become increasingly popular.These neural approaches generate semantically meaningful tests that are more maintainable compared with traditional automated testing methods such as fuzzing.However, the diversity and volume of unit tests in current datasets are limited, especially for newer but important languages.In this paper, we present a novel data augmentation technique, *FuzzAug*,that brings the benefits of fuzzing to large language models by incorporating valid testing semantics and providing diverse coverage-guided inputs.Doubling the size of training datasets,FuzzAug improves performance over the baselines significantly.This technique demonstrates the potential of introducing prior knowledge from dynamic software analysisto improve neural test generation,offering significant enhancements in this task.Our code is open-sourced at https://github.com/SecurityLab-UCD/FuzzAug.

pdf bib abs

Active Layer-Contrastive Decoding Reduces Hallucination in Large Language Model Generation
Hongxiang Zhang | Hao Chen | Muhao Chen | Tianyi Zhang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Recent decoding methods improve the factuality of large language models (LLMs) by refining how the next token is selected during generation. These methods typically operate at the token level, leveraging internal representations to suppress superficial patterns. Nevertheless, LLMs remain prone to hallucinations, especially over longer contexts. In this paper, we propose Active Layer-Contrastive Decoding (ActLCD), a novel decoding strategy that actively decides when to apply contrasting layers during generation. By casting decoding as a sequential decision-making problem, ActLCD employs a reinforcement learning policy guided by a reward-aware classifier to optimize factuality beyond the token level. Our experiments demonstrate that ActLCD surpasses state-of-the-art methods across five benchmarks, showcasing its effectiveness in mitigating hallucinations in diverse generation scenarios.

2023

pdf bib abs

Understanding Programs by Exploiting (Fuzzing) Test Cases
Jianyu Zhao | Yuyang Rong | Yiwen Guo | Yifeng He | Hao Chen
Findings of the Association for Computational Linguistics: ACL 2023

Semantic understanding of programs has attracted great attention in the community. Inspired by recent successes of large language models (LLMs) in natural language understanding, tremendous progress has been made by treating programming language as another sort of natural language and training LLMs on corpora of program code. However, programs are essentially different from texts after all, in a sense that they are normally heavily structured and syntax-strict. In particular, programs and their basic units (i.e., functions and subroutines) are designed to demonstrate a variety of behaviors and/or provide possible outputs, given different inputs. The relationship between inputs and possible outputs/behaviors represents the functions/subroutines and profiles the program as a whole. Hence, we propose to incorporate such a relationship into learning, for achieving a deeper semantic understanding of programs. To obtain inputs that are representative enough to trigger the execution of most part of the code, we resort to fuzz testing and propose fuzz tuning to boost the performance of program understanding and code representation learning, given a pre-trained LLM. The effectiveness of the proposed method is verified on two program understanding tasks including code clone detection and code classification, and it outperforms current state-of-the-arts by large margins. Code is available at https://github.com/rabbitjy/FuzzTuning.

Co-authors

Venues

Findings3
EMNLP1

Fix author