Kai Tang


2024

pdf
Learning Geometry-Aware Representations for New Intent Discovery
Kai Tang | Junbo Zhao | Xiao Ding | Runze Wu | Lei Feng | Gang Chen | Haobo Wang
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

New intent discovery (NID) is an important problem for deploying practical dialogue systems, which trains intent classifiers on a semi-supervised corpus where unlabeled user utterances contain both known and novel intents. Most existing NID algorithms place hope on the sample similarity to cluster unlabeled corpus to known or new samples. Lacking supervision on new intents, we experimentally find the intent classifier fails to fully distinguish new intents since they tend to assemble into intertwined centers.To address this problem, we propose a novel GeoID framework that learns geometry-aware representations to maximally separate all intents. Specifically, we are motivated by the recent findings on Neural Collapse (NC) in classification tasks to derive optimal intent center structure. Meanwhile, we devise a dual pseudo-labeling strategy based on optimal transport assignments and semi-supervised clustering, ensuring proper utterances-to-center arrangement.Extensive results show that our GeoID method establishes a new state-of-the-art performance, achieving a +3.49% average accuracy improvement on three standardized benchmarking datasets. We also verify its usefulness in assisting large language models for improved in-context performance.

2005

pdf
Improving Translation Memory with Word Alignment Information
Hua Wu | Haifeng Wang | Zhanyi Liu | Kai Tang
Proceedings of Machine Translation Summit X: Posters

This paper describes a generalized translation memory system, which takes advantage of sentence level matching, sub-sentential matching, and pattern-based machine translation technologies. All of the three techniques generate translation suggestions with the assistance of word alignment information. For the sentence level matching, the system generates the translation suggestion by modifying the translations of the most similar example with word alignment information. For sub-sentential matching, the system locates the translation fragments in several examples with word alignment information, and then generates the translation suggestion by combining these translation fragments. For pattern-based machine translation, the system first extracts translation patterns from examples using word alignment information and then generates translation suggestions with pattern matching. This system is compared with a traditional translation memory system without word alignment information in terms of translation efficiency and quality. Evaluation results indicate that our system improves the translation quality and saves about 20% translation time.