Shiqi Zhou
Also published as: 士祺 周
2025
Tag-Evol: Achieving Efficient Instruction Evolving via Tag Injection
Yixuan Wang
|
Shiqi Zhou
|
Chuanzhe Guo
|
Qingfu Zhu
Findings of the Association for Computational Linguistics: ACL 2025
Evol-Instruct has made significant improvements as a data synthesis method in several areas. Existing methods typically rely on a fixed set of strategies to evolve, which require manual design and are monolithic in form. In addition, iterative evolution also makes the acquisition of hard samples expensive. In view of this, we propose the Tag-Evol framework, a more diverse and efficient instruction evolving method. Specifically, Tag-Evol uses diverse and specific knowledge tags as strategies to achieve controlled evolution by injecting different combinations of tags into the original instructions. Experiments with multiple backbones in mathematical and code domain benchmarks show that the proposed method generates significantly better evolved data than other methods. Furthermore, we conduct a thorough analysis of the evolved data, demonstrating that Tag-Evol is not only efficient but also generates more diverse and challenging data.
2024
SpanCS:面向跨语言代码生成的片段级语码转换(SpanCS: Span-Level Code-Switching for Cross-Lingual Code Generation)
Qingfu Zhu (朱庆福)
|
Shiqi Zhou (周士祺)
|
Shuo Wang (王硕)
|
Zhiming Zhang (张致铭)
|
Haoyu Wang (王昊钰)
|
Qiguang Chen (陈麒光)
|
Wanxiang Che (车万翔)
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“跨语言代码生成旨在将英语到代码的生成能力迁移至其他自然语言。翻译-训 练(Translate-Train)和语码转换(Code-Switching)是实现跨语言迁移的两类经典数据增广方法,两者优势互补但尚未有效结合。为此,本文提出了一种面向跨语言代码生成的片段级语码转换(SpanCS)方法。首先,该方法利用语码转换框架关联源语言上下文与目标语言片段,以促进多种语言的交互和对齐。其次,该方法利用翻译-训练方法从完整的源语言翻译中提取目标语言片段,以保证增广数据与原始数据间的语义一致性。为了公平地评价多种自然语言之间代码生成的性能差异,本文通过人工翻译与校验,基于HumanEval构建了包含10种自然语言的多语言代码生成评测基MHumanEval。该基准上的三个主干模型的实验结果表明,SpanCS在跨语言代码生成任务上一致优于前人的数据增广方法。”
Search
Fix author
Co-authors
- Qingfu Zhu (朱庆福) 2
- Wanxiang Che (车万翔) 1
- Qiguang Chen (陈麒光) 1
- Chuanzhe Guo 1
- Shuo Wang (王硕) 1
- show all...