Yitong Yao

2025

In this paper, we present a novel pipeline for the XLLM Shared Task-III: Large Language Model for Structural Reasoning (LLM-SR). Our pipeline addresses key challenges in automatic process-reward training data construction, such as high manual annotation costs, limited accuracy of large models in structured data processing, and dependency on auxiliary information for validation. To overcome these limitations, we first decompose the construction process into extraction and validation phases. Leveraging model-generated annotations, we produce pseudo-labeled data and iteratively refine model performance. Second, by analyzing structured data patterns, we encode structural constraints into a rule-based module and fine-tune the model with Gradient Reward Policy Optimization (GRPO), significantly improving structured data extraction success rates. Finally, we train the model to generate critical responses that assess evidence-conclusion relationships, thus enhancing validation reliability. Experimental results demonstrate that our pipeline outperforms models with an order of magnitude more parameters and achieves the first position on the task.

2024

pdf bib abs
TeleChat: An Open-source Billingual Large Language Model
Zihan Wang | Liuxz2@chinatelecom.cn Liuxz2@chinatelecom.cn | Liusx14@chinatelecom.cn Liusx14@chinatelecom.cn | Yitong Yao | Huangyy121@chinatelecom.cn Huangyy121@chinatelecom.cn | Li Mengxiang | Zhongjiang He | Liyx25@chinatelecom.cn Liyx25@chinatelecom.cn | Pulw@chinatelecom.cn Pulw@chinatelecom.cn | Xuhn@chinatelecom.cn Xuhn@chinatelecom.cn | Chao Wang | Shuangyong Song
Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10)

In this paper, we present TeleChat, a collection of large language models (LLMs) with parameters of 7 billion and 12 billion. TeleChat is initially pretrained on an extensive corpus containing a diverse collection of texts from both English and Chinese languages, encompassing trillions of tokens. Subsequently, the model undergoes fine-tuning to align with human preferences, following a detailed methodology that we describe. We evaluate the performance of TeleChat on various tasks, including general dialogue generation, language understanding, mathematics, reasoning, code generation, and knowledge-based question answering. Our findings indicate that TeleChat achieves state-of-the-art performance to other open-source models of similar size across a wide range of public benchmarks. To support future research and applications utilizing LLMs, we release the fine-tuned model checkpoints of TeleChat-7B and TeleChat-12B, along with code and a portion of our filtered high-quality pretraining data, to the public community.

Yitong Yao

2025

2024

Co-authors

Venues