Yi-Ren Yeh

2026

MegaRAG: Multimodal Knowledge Graph-Based Retrieval Augmented Generation
Chi-Hsiang Hsiao | Yi-Cheng Wang | Tzung-Sheng Lin | Yi-Ren Yeh | Chu-song Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Retrieval-augmented generation (RAG) enables large language models (LLMs) to dynamically access external information, which is powerful for answering questions over previously unseen documents. Nonetheless, they struggle with high-level conceptual understanding and holistic comprehension due to limited context windows, which constrain their ability to perform deep reasoning over long-form, domain-specific content such as full-length books. To solve this problem, knowledge graphs (KGs) have been leveraged to provide entity-centric structure and hierarchical summaries, offering more structured support for reasoning. However, existing KG-based RAG solutions remain restricted to text-only inputs and fail to leverage the complementary insights provided by other modalities such as vision. On the other hand, reasoning from visual documents requires textual, visual, and spatial cues into structured, hierarchical concepts. To address this issue, we introduce a multimodal knowledge graph-based RAG that enables cross-modal reasoning for better content understanding. Our method incorporates visual cues into the construction of knowledge graphs, the retrieval phase, and the answer generation process. Experimental results across both global and fine-grained question answering tasks show that our approach consistently outperforms existing approaches on both textual and multimodal benchmarks.

2021

pdf bib abs

Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition
Yi-Chang Chen | Chun-Yen Cheng | Chien-An Chen | Ming-Chieh Sung | Yi-Ren Yeh
Proceedings of the 33rd Conference on Computational Linguistics and Speech Processing (ROCLING 2021)

Due to the recent advances of natural language processing, several works have applied the pre-trained masked language model (MLM) of BERT to the post-correction of speech recognition. However, existing pre-trained models only consider the semantic correction while the phonetic features of words is neglected. The semantic-only post-correction will consequently decrease the performance since homophonic errors are fairly common in Chinese ASR. In this paper, we proposed a novel approach to collectively exploit the contextualized representation and the phonetic information between the error and its replacing candidates to alleviate the error rate of Chinese ASR. Our experiment results on real world speech recognition datasets showed that our proposed method has evidently lower CER than the baseline model, which utilized a pre-trained BERT MLM as the corrector.

Co-authors

Tzung-Sheng Lin 1

Ming-Chieh Sung 1

Yi-Cheng Wang 1

Venues

ACL1
ROCLING1

Fix author