Runlin Liu

2026

Beyond Superficial Tests: Adversarial Refinement for Reliable Property-Based Testing
Xiao Li | Runlin Liu | Zhe Zhang | Xiang Gao | Hailong Sun
Findings of the Association for Computational Linguistics: ACL 2026

Large Language Models (LLMs) have demonstrated remarkable proficiency in code generation, yet their application to Property-Based Testing (PBT) remains fraught with a superficiality gap. While LLMs can readily generate syntactically correct tests, they often struggle to bridge the semantic gap between code implementation and its intended invariant logic, resulting in weak properties that provide a false sense of security. To address this, we introduce PROBE, an agentic framework that hardens software properties through Adversarial Refinement. Unlike traditional generation approaches, PROBE treats test generation as a game of semantic asymmetry: it employs a Validator agent to actively generate counter-implementations, which are semantically incorrect codes that satisfy the generated property, to expose loopholes in the specification. Furthermore, PROBE constructs a cross-functional semantic graph to capture deep dependencies often missed by local analysis. Extensive evaluation reveals that PROBE increases mutation scores by 9.79% over baselines. In real-world deployment, PROBE identified 45 previously unknown bugs in top-tier libraries that have been confirmed by developers, demonstrating its ability to uncover deep semantic defects.

2024

pdf bib abs

In the field of speech synthesis, there is a growing emphasis on employing multimodal speech to enhance robustness. A key challenge in this area is the scarcity of datasets that pair audio with corresponding video. We employ a methodology that incorporates modality alignment during the pre-training phase on multimodal datasets, uniquely facilitating zero-shot generalization through the process of freezing the video modality feature extraction component and the encoder module within the pretrained weights, thereby enabling effective cross-modal and cross-lingual transfer. We have named this method ‘Uni-Dubbing’. Our method finely tunes with both multimodal and single-modality audio data. In multimodal scenarios, it achieves a reduced word error rate (WER) of 31.73%, surpassing the previous best of 33.9%. It also excels in metrics like tone quality and synchronization. With single-modality audio, it achieves a WER of 36.08%, demonstrating adaptability to limited data. Its domain generalization capabilities are proven across various language tasks in video translation and audio generation. Trained on 433 hours of audio data, it surpasses techniques using 200 hours of audiovisual data. The code and demo are available at https://diracer.github.io/unidubbing.

Co-authors

Xiao Li 1

Venues

ACL1
Findings1

Fix author