Chih-Ming Chen
2026
ESG-KG: A Multi-modal Knowledge Graph System for Automated Compliance Assessment
Li-Yang Chang | Chih-Ming Chen | Hen-Hsen Huang | Ming-Feng Tsai | An-Zi Yen | Chuan-Ju Wang
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Li-Yang Chang | Chih-Ming Chen | Hen-Hsen Huang | Ming-Feng Tsai | An-Zi Yen | Chuan-Ju Wang
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)
Our system is built upon a multi-modal information extraction pipeline designed to process and interpret corporate sustainability reports. This integrated framework systematically handles diverse data formats—including text, tables, figures, and infographics—to extract, structure, and evaluate ESG-related content. The extracted multi-modal data is subsequently formalized into a structured knowledge graph (KG), which serves as both a semantic framework for representing entities, relationships, and metrics relevant to ESG domains, and as the foundational infrastructure for the automated compliance system. This KG enables high-precision retrieval of information across multiple source formats and reporting modalities. The trustworthy, context-rich representations provided by the knowledge graph establish a verifiable evidence base, creating a critical foundation for reliable retrieval-augmented generation (RAG) and subsequent LLM-based scoring and analysis of automatic ESG compliance system.
2025
MMLF: Multi-query Multi-passage Late Fusion Retrieval
Yuan-Ching Kuo | Yi Yu | Chih-Ming Chen | Chuan-Ju Wang
Findings of the Association for Computational Linguistics: NAACL 2025
Yuan-Ching Kuo | Yi Yu | Chih-Ming Chen | Chuan-Ju Wang
Findings of the Association for Computational Linguistics: NAACL 2025
Leveraging large language models (LLMs) for query expansion has proven highly effective across diverse tasks and languages. Yet, challenges remain in optimizing query formatting and prompting, often with less focus on handling retrieval results. In this paper, we introduce Multi-query Multi-passage Late Fusion (MMLF), a straightforward yet potent pipeline that generates sub-queries, expands them into pseudo-documents, retrieves them individually, and aggregates results using reciprocal rank fusion. Our experiments demonstrate that MMLF exhibits superior performance across five BEIR benchmark datasets, achieving an average improvement of 4% and a maximum gain of up to 8% in both Recall@1k and nDCG@10 compared to state of the art across BEIR information retrieval datasets.