Jun-Seong Kim
2024
Improving Multi-lingual Alignment Through Soft Contrastive Learning
Minsu Park
|
Seyeon Choi
|
Chanyeol Choi
|
Jun-Seong Kim
|
Jy-yong Sohn
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
Making decent multi-lingual sentence representations is critical to achieve high performances in cross-lingual downstream tasks. In this work, we propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model. Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cross-lingual embeddings follows the similarity of sentences measured at the mono-lingual teacher model. Our method can be considered as contrastive learning with soft labels defined as the similarity between sentences. Our experimental results on five languages show that our contrastive loss with soft labels far outperforms conventional constrastive loss with hard labels in various benchmarks for bitext mining tasks and STS tasks. In addition, our method outperforms existing multi-lingual embeddings including LaBSE, for Tatoeba dataset.
2018
Modeling with Recurrent Neural Networks for Open Vocabulary Slots
Jun-Seong Kim
|
Junghoe Kim
|
SeungUn Park
|
Kwangyong Lee
|
Yoonju Lee
Proceedings of the 27th International Conference on Computational Linguistics
Dealing with ‘open-vocabulary’ slots has been among the challenges in the natural language area. While recent studies on attention-based recurrent neural network (RNN) models have performed well in completing several language related tasks such as spoken language understanding and dialogue systems, there has been a lack of attempts to address filling slots that take on values from a virtually unlimited set. In this paper, we propose a new RNN model that can capture the vital concept: Understanding the role of a word may vary according to how long a reader focuses on a particular part of a sentence. The proposed model utilizes a long-term aware attention structure, positional encoding primarily considering the relative distance between words, and multi-task learning of a character-based language model and an intent detection model. We show that the model outperforms the existing RNN models with respect to discovering ‘open-vocabulary’ slots without any external information, such as a named entity database or knowledge base. In particular, we confirm that it performs better with a greater number of slots in a dataset, including unknown words, by evaluating the models on a dataset of several domains. In addition, the proposed model also demonstrates superior performance with regard to intent detection.
Search
Co-authors
- Minsu Park 1
- Seyeon Choi 1
- Chanyeol Choi 1
- Jy-Yong Sohn 1
- Junghoe Kim 1
- show all...