Zeyu Xiong
2025
DPGA-TextSyn: Differentially Private Genetic Algorithm for Synthetic Text Generation
Zhonghao Sun
|
Zhiliang Tian
|
Yiping Song
|
Yuyi Si
|
Juhua Zhang
|
Minlie Huang
|
Kai Lu
|
Zeyu Xiong
|
Xinwang Liu
|
Dongsheng Li
Findings of the Association for Computational Linguistics: ACL 2025
Using large language models (LLMs) has a potential risk of privacy leakage since the data with sensitive information may be used for fine-tuning the LLMs. Differential privacy (DP) provides theoretical guarantees of privacy protection, but its practical application in LLMs still has the problem of privacy-utility trade-off. Researchers synthesized data with strong generation capabilities closed-source LLMs (i.e., GPT-4) under DP to alleviate this problem, but this method is not so flexible in fitting the given privacy distributions without fine-tuning. Besides, such methods can hardly balance the diversity of synthetic data and its relevance to target privacy data without accessing so much private data. To this end, this paper proposes DPGA-TextSyn, combining general LLMs with genetic algorithm (GA) to produce relevant and diverse synthetic text under DP constraints. First, we integrate the privacy gene (i.e., metadata) to generate better initial samples. Then, to achieve survival of the fittest and avoid homogeneity, we use privacy nearest neighbor voting and similarity suppression to select elite samples. In addition, we expand elite samples via genetic strategies such as mutation, crossover, and generation to expand the search scope of GA. Experiments show that this method significantly improves the performance of the model in downstream tasks while ensuring privacy.
2022
Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding
Jiahao Zhu
|
Daizong Liu
|
Pan Zhou
|
Xing Di
|
Yu Cheng
|
Song Yang
|
Wenzheng Xu
|
Zichuan Xu
|
Yao Wan
|
Lichao Sun
|
Zeyu Xiong
Findings of the Association for Computational Linguistics: EMNLP 2022
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then interact them with query for reasoning.However, we argue that these methods have overlooked two indispensable issues:1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries.2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model.To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding.Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
Search
Fix author
Co-authors
- Yu Cheng 1
- Xing Di 1
- Minlie Huang 1
- Dongsheng Li 1
- Daizong Liu 1
- show all...