2025
pdf
bib
abs
DROWN: Towards Tighter LiRPA-based Robustness Certification
Yunruo Zhang
|
Tianyu Du
|
Shouling Ji
|
Shanqing Guo
Proceedings of the 31st International Conference on Computational Linguistics
The susceptibility of deep neural networks to adversarial attacks is a well-established concern. To address this problem, robustness certification is proposed, which, unfortunately, suffers from precision or scalability issues. In this paper, we present DROWN (Dual CROWN), a novel method for certifying the robustness of DNNs. The advantage of DROWN is that it tightens classic LiRPA-based methods yet maintains similar scalability, which comes from refining pre-activation bounds of ReLU relaxations using two pairs of linear bounds derived from different relaxations of ReLU units in previous layers. The extensive evaluations show that DROWN achieves up to 83.39% higher certified robust accuracy than the baseline on CNNs and up to 4.68 times larger certified radii than the baseline on Transformers. Meanwhile, the running time of DROWN is about twice that of the baseline.
pdf
bib
abs
Probing the Geometry of Truth: Consistency and Generalization of Truth Directions in LLMs Across Logical Transformations and Question Answering Tasks
Yuntai Bao
|
Xuhong Zhang
|
Tianyu Du
|
Xinkui Zhao
|
Zhengwen Feng
|
Hao Peng
|
Jianwei Yin
Findings of the Association for Computational Linguistics: ACL 2025
Large language models (LLMs) are trained on extensive datasets that encapsulate substantial world knowledge. However, their outputs often include confidently stated inaccuracies. Earlier works suggest that LLMs encode truthfulness as a distinct linear feature, termed the “truth direction”, which can classify truthfulness reliably. We address several open questions about the truth direction: (i) whether LLMs universally exhibit consistent truth directions; (ii) whether sophisticated probing techniques are necessary to identify truth directions; and (iii) how the truth direction generalizes across diverse contexts.Our findings reveal that not all LLMs exhibit consistent truth directions, with stronger representations observed in more capable models, particularly in the context of logical negation.Additionally, we demonstrate that truthfulness probes trained on declarative atomic statements can generalize effectively to logical transformations, question-answering tasks, in-context learning, and external knowledge sources.Finally, we explore the practical application of truthfulness probes in selective question-answering, illustrating their potential to improve user trust in LLM outputs.These results advance our understanding of truth directions and provide new insights into the internal representations of LLM beliefs.
2024
pdf
bib
abs
ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis
Yanming Liu
|
Xinyue Peng
|
Tianyu Du
|
Jianwei Yin
|
Weihao Liu
|
Xuhong Zhang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) have achieved commendable accomplishments in various natural language processing tasks. However, LLMs still encounter significant challenges when dealing with complex scenarios involving multiple entities. These challenges arise from the presence of implicit relationships that demand multi-step reasoning. In this paper, we propose a novel approach ERA-CoT, which aids LLMs in understanding context by capturing relationships between entities and supports the reasoning of diverse tasks through Chain-of-Thoughts (CoT).Experimental results show that ERA-CoT demonstrates the superior performance of our proposed method compared to current CoT prompting methods, achieving a significant improvement of an average of 5.1% on GPT3.5 compared to previous SOTA baselines. Our analysis indicates that ERA-CoT increases the LLM’s understanding of entity relationships, significantly improves the accuracy of question answering, and enhances the reasoning ability of LLMs.
pdf
bib
abs
SecCoder: Towards Generalizable and Robust Secure Code Generation
Boyu Zhang
|
Tianyu Du
|
Junkai Tong
|
Xuhong Zhang
|
Kingsum Chow
|
Sheng Cheng
|
Xun Wang
|
Jianwei Yin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
After large models (LMs) have gained widespread acceptance in code-related tasks, their superior generative capacity has greatly promoted the application of the code LM. Nevertheless, the security of the generated code has raised attention to its potential damage. Existing secure code generation methods have limited generalizability to unseen test cases and poor robustness against the attacked model, leading to safety failures in code generation. In this paper, we propose a generalizable and robust secure code generation method SecCoder by using in-context learning (ICL) and the safe demonstration. The dense retriever is also used to select the most helpful demonstration to maximize the improvement of the generated code’s security. Experimental results show the superior generalizability of the proposed model SecCoder compared to the current secure code generation method, achieving a significant security improvement of an average of 7.20% on unseen test cases. The results also show the better robustness of SecCoder compared to the current attacked code LM, achieving a significant security improvement of an average of 7.74%. Our analysis indicates that SecCoder enhances the security of LMs in generating code, and it is more generalizable and robust.
pdf
bib
abs
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback
Yanming Liu
|
Xinyue Peng
|
Xuhong Zhang
|
Weihao Liu
|
Jianwei Yin
|
Jiannan Cao
|
Tianyu Du
Findings of the Association for Computational Linguistics: ACL 2024
Large language models (LLMs) demonstrate exceptional performance in numerous tasks but still heavily rely on knowledge stored in their parameters. Moreover, updating this knowledge incurs high training costs. Retrieval-augmented generation (RAG) methods address this issue by integrating external knowledge. The model can answer questions it couldn’t previously by retrieving knowledge relevant to the query. This approach improves performance in certain scenarios for specific tasks. However, if irrelevant texts are retrieved, it may impair model performance. In this paper, we propose Retrieval Augmented Iterative Self-Feedback (RA-ISF), a framework that iteratively decomposes tasks and processes them in three submodules to enhance the model’s problem-solving capabilities. Experiments show that our method outperforms existing benchmarks, performing well on models like GPT3.5, Llama2, significantly enhancing factual reasoning capabilities and reducing hallucinations.