Ting-Yu Yen
2025
Text-centric Alignment for Bridging Test-time Unseen Modality
Yun-Da Tsai
|
Ting-Yu Yen
|
Pei-Fu Guo
|
Zhe-Yan Li
|
Shou-De Lin
Findings of the Association for Computational Linguistics: EMNLP 2025
This paper addresses the challenge of handling unseen modalities and dynamic modality combinations at test time with our proposed text-centric alignment method. This training-free alignment approach unifies different input modalities into a single semantic text representation by leveraging in-context learning with Large Language Models and uni-modal foundation models. Our method significantly enhances the ability to manage unseen, diverse, and unpredictable modality combinations, making it suitable for both generative and discriminative models to adopt on top. Our extensive experiments primarily evaluate on discriminative tasks, demonstrating that our approach is essential for LLMs to achieve strong modality alignment performance. It also surpasses the limitations of traditional fixed-modality frameworks in embedding representations. This study contributes to the field by offering a flexible and effective solution for real-world applications where modality availability is dynamic and uncertain.
2020
MSD-1030: A Well-built Multi-Sense Evaluation Dataset for Sense Representation Models
Ting-Yu Yen
|
Yang-Yin Lee
|
Yow-Ting Shiue
|
Hen-Hsen Huang
|
Hsin-Hsi Chen
Proceedings of the Twelfth Language Resources and Evaluation Conference
Sense embedding models handle polysemy by giving each distinct meaning of a word form a separate representation. They are considered improvements over word models, and their effectiveness is usually judged with benchmarks such as semantic similarity datasets. However, most of these datasets are not designed for evaluating sense embeddings. In this research, we show that there are at least six concerns about evaluating sense embeddings with existing benchmark datasets, including the large proportions of single-sense words and the unexpected inferior performance of several multi-sense models to their single-sense counterparts. These observations call into serious question whether evaluations based on these datasets can reflect the sense model’s ability to capture different meanings. To address the issues, we propose the Multi-Sense Dataset (MSD-1030), which contains a high ratio of multi-sense word pairs. A series of analyses and experiments show that MSD-1030 serves as a more reliable benchmark for sense embeddings. The dataset is available at http://nlg.csie.ntu.edu.tw/nlpresource/MSD-1030/.
2018
GenSense: A Generalized Sense Retrofitting Model
Yang-Yin Lee
|
Ting-Yu Yen
|
Hen-Hsen Huang
|
Yow-Ting Shiue
|
Hsin-Hsi Chen
Proceedings of the 27th International Conference on Computational Linguistics
With the aid of recently proposed word embedding algorithms, the study of semantic similarity has progressed and advanced rapidly. However, many natural language processing tasks need sense level representation. To address this issue, some researches propose sense embedding learning algorithms. In this paper, we present a generalized model from existing sense retrofitting model. The generalization takes three major components: semantic relations between the senses, the relation strength and the semantic strength. In the experiment, we show that the generalized model can outperform previous approaches in three types of experiment: semantic relatedness, contextual word similarity and semantic difference.
Search
Fix author
Co-authors
- Hsin-Hsi Chen 2
- Hen-Hsen Huang 2
- Yang-Yin Lee 2
- Yow-Ting Shiue 2
- Pei-Fu Guo 1
- show all...