Li Mengxiang

2024

Hierarchical text classification aims at categorizing texts into a multi-tiered tree-structured hierarchy of labels. Existing methods pay more attention to capture hierarchy-aware text feature by exploiting explicit parent-child relationships, while interactions between peer labels are rarely taken into account, resulting in severe label confusion within each layer. In this work, we propose a novel Dual Prompt Tuning (DPT) method, which emphasizes identifying discrimination among peer labels by performing contrastive learning on each hierarchical layer. We design an innovative hand-crafted prompt containing slots for both positive and negative label predictions to cooperate with contrastive learning. In addition, we introduce a label hierarchy self-sensing auxiliary task to ensure cross-layer label consistency. Extensive experiments demonstrate that DPT achieves significant improvements and outperforms the current state-of-the-art methods on BGC and RCV1-V2 benchmark datasets.

pdf bib abs
TeleChat: An Open-source Billingual Large Language Model
Zihan Wang | Liuxz2@chinatelecom.cn Liuxz2@chinatelecom.cn | Liusx14@chinatelecom.cn Liusx14@chinatelecom.cn | Yitong Yao | Huangyy121@chinatelecom.cn Huangyy121@chinatelecom.cn | Li Mengxiang | Zhongjiang He | Liyx25@chinatelecom.cn Liyx25@chinatelecom.cn | Pulw@chinatelecom.cn Pulw@chinatelecom.cn | Xuhn@chinatelecom.cn Xuhn@chinatelecom.cn | Chao Wang | Shuangyong Song
Proceedings of the 10th SIGHAN Workshop on Chinese Language Processing (SIGHAN-10)

In this paper, we present TeleChat, a collection of large language models (LLMs) with parameters of 7 billion and 12 billion. TeleChat is initially pretrained on an extensive corpus containing a diverse collection of texts from both English and Chinese languages, encompassing trillions of tokens. Subsequently, the model undergoes fine-tuning to align with human preferences, following a detailed methodology that we describe. We evaluate the performance of TeleChat on various tasks, including general dialogue generation, language understanding, mathematics, reasoning, code generation, and knowledge-based question answering. Our findings indicate that TeleChat achieves state-of-the-art performance to other open-source models of similar size across a wide range of public benchmarks. To support future research and applications utilizing LLMs, we release the fine-tuned model checkpoints of TeleChat-7B and TeleChat-12B, along with code and a portion of our filtered high-quality pretraining data, to the public community.