Ziqi Jin
2025
Self-Harmonized Chain of Thought
Ziqi Jin
|
Wei Lu
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Chain-of-thought (CoT) prompting has demonstrated the capacity of large language models to perform complex reasoning through intermediate steps. While effective, current CoT methods face challenges: Zero-shot-CoT can lead to reasoning errors, and Few-shot-CoT requires labor-intensive manual demonstrations. Auto-CoT attempts to address these issues by automatically generating diverse demonstrations, but this diversity can lead to inconsistent reasoning patterns. We propose ECHO (Self-Harmonized Chain of Thought), a novel method that unifies diverse solution paths into a consistent and effective reasoning pattern. ECHO employs an iterative process to refine and harmonize automatically generated demonstrations, mitigating the limitations of existing approaches. Our comprehensive experiments across arithmetic, commonsense, and symbolic reasoning tasks demonstrate that ECHO outperforms Auto-CoT by an average of 2.8%. These findings suggest that ECHO represents a significant step towards more robust and generalizable automated reasoning in large language models.
2024
Sailor: Open Language Models for South-East Asia
Longxu Dou
|
Qian Liu
|
Guangtao Zeng
|
Jia Guo
|
Jiahui Zhou
|
Xin Mao
|
Ziqi Jin
|
Wei Lu
|
Min Lin
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
We present Sailor, a family of open language models ranging from 0.5B to 14B parameters, tailored for South-East Asian (SEA) languages. From Qwen1.5, Sailor models accept 200B to 400B tokens during continual pre-training, primarily covering the languages of English, Chinese, Vietnamese, Thai, Indonesian, Malay, and Lao. The training leverages several techniques, including BPE dropout for improving the model robustness, aggressive data cleaning and deduplication, and small proxy models to optimize the data mixture. Experimental results on four typical tasks indicate that Sailor models demonstrate strong performance across different benchmarks, including commonsense reasoning, question answering, reading comprehension and examination. We share our insights to spark a wider interest in developing large language models for multilingual use cases.