Zhongjiang He

2025

With the rapid advancement of large language models (LLMs), recent researchers have increasingly focused on the superior capabilities of LLMs in text/code understanding and generation to tackle text-to-SQL tasks. Traditional approaches adopt schema linking to first eliminate redundant tables and columns and prompt LLMs for SQL generation. However, they often struggle with accurately identifying corresponding tables and columns, due to discrepancies in naming conventions between natural language questions (NL) and database schemas. Besides, existing methods overlook the challenge of effectively transforming structure information from NL into SQL. To address these limitations, we introduce UCS-SQL, a novel text-to-SQL framework, uniting both content and structure pipes to bridge the gap between NL and SQL. Specifically, the content pipe focuses on identifying key content within the original content, while the structure pipe is dedicated to transforming the linguistic structure from NL to SQL. Additionally, we strategically selects few-shot examples by considering both the SQL Skeleton and Question Expression (SS-QE selection method), thus providing targeted examples for SQL generation. Experimental results on BIRD and Spider demonstrate the effectiveness of our UCS-SQL framework.

Multilingual spoken language understanding (SLU) involves intent detection (ID) and slot filling (SF) across multiple languages. The inherent linguistic diversity presents significant challenges in achieving performance comparable to traditional SLU. Recent studies have attempted to improve multilingual SLU performance by sharing multilingual encoders. However, these approaches have not directly established information flow between languages. To address this, we first demonstrate the feasibility of such information transfer and pinpoint the key challenges: prediction error mitigation and multilingual slot alignment. We then propose the INformation Transfer network (INT) to tackle these challenges. The gate unit in INT controls the information flow between languages, reducing the adverse impact of prediction errors on both ID and SF. Additionally, we reformulate SF as a span prediction problem and introduce a slot-matching attention mechanism to achieve slot alignment across languages. Experimental results on the MASSIVE and MASSIVE-UG datasets show that our model outperforms all baselines in overall accuracy across all languages, and demonstrates robust performance when different languages are used as the source.

2024

Hierarchical text classification aims at categorizing texts into a multi-tiered tree-structured hierarchy of labels. Existing methods pay more attention to capture hierarchy-aware text feature by exploiting explicit parent-child relationships, while interactions between peer labels are rarely taken into account, resulting in severe label confusion within each layer. In this work, we propose a novel Dual Prompt Tuning (DPT) method, which emphasizes identifying discrimination among peer labels by performing contrastive learning on each hierarchical layer. We design an innovative hand-crafted prompt containing slots for both positive and negative label predictions to cooperate with contrastive learning. In addition, we introduce a label hierarchy self-sensing auxiliary task to ensure cross-layer label consistency. Extensive experiments demonstrate that DPT achieves significant improvements and outperforms the current state-of-the-art methods on BGC and RCV1-V2 benchmark datasets.

This paper describes the participation of team “TeleAI” in the third International Chinese Ancient Chinese Language Information Processing Evaluation (EvalHan24). The competition comprises a joint task of sentence segmentation and punctuation, categorized into open and closed tracks based on the models and data used. In the final evaluation, our system achieved significantly better results than the baseline. Specifically, in the closed-track sentence segmentation task, we obtained an F1 score of 0.8885, while in the sentence punctuation task, we achieved an F1 score of 0.7129.

pdf bib abs
TeleChat: An Open-source Billingual Large Language Model
Zihan Wang | Liuxz2@chinatelecom.cn Liuxz2@chinatelecom.cn | Liusx14@chinatelecom.cn Liusx14@chinatelecom.cn | Yitong Yao | Huangyy121@chinatelecom.cn Huangyy121@chinatelecom.cn | Li Mengxiang | Zhongjiang He | Liyx25@chinatelecom.cn Liyx25@chinatelecom.cn | Pulw@chinatelecom.cn Pulw@chinatelecom.cn | Xuhn@chinatelecom.cn Xuhn@chinatelecom.cn | Chao Wang | Shuangyong Song
Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10)

In this paper, we present TeleChat, a collection of large language models (LLMs) with parameters of 7 billion and 12 billion. TeleChat is initially pretrained on an extensive corpus containing a diverse collection of texts from both English and Chinese languages, encompassing trillions of tokens. Subsequently, the model undergoes fine-tuning to align with human preferences, following a detailed methodology that we describe. We evaluate the performance of TeleChat on various tasks, including general dialogue generation, language understanding, mathematics, reasoning, code generation, and knowledge-based question answering. Our findings indicate that TeleChat achieves state-of-the-art performance to other open-source models of similar size across a wide range of public benchmarks. To support future research and applications utilizing LLMs, we release the fine-tuned model checkpoints of TeleChat-7B and TeleChat-12B, along with code and a portion of our filtered high-quality pretraining data, to the public community.

Zhongjiang He

2025

2024

Co-authors

Venues