Minju Song
2025
DMIS Lab at ArchEHR-QA 2025: Evidence-Grounded Answer Generation for EHR-based QA via a Multi-Agent Framework
Hyeon Hwang
|
Hyeongsoon Hwang
|
Jongmyung Jung
|
Jaehoon Yun
|
Minju Song
|
Yein Park
|
Dain Kim
|
Taewhoo Lee
|
Jiwoong Sohn
|
Chanwoong Yoon
|
Sihyeon Park
|
Jiwoo Lee
|
Heechul Yang
|
Jaewoo Kang
BioNLP 2025 Shared Tasks
ETHIC: Evaluating Large Language Models on Long-Context Tasks with High Information Coverage
Taewhoo Lee
|
Chanwoong Yoon
|
Kyochul Jang
|
Donghyeon Lee
|
Minju Song
|
Hyunjae Kim
|
Jaewoo Kang
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Recent advancements in large language models (LLM) capable of processing extremely long texts highlight the need for a dedicated evaluation benchmark to assess their long-context capabilities. However, existing methods, like the needle-in-a-haystack test, do not effectively assess whether these models fully utilize contextual information, raising concerns about the reliability of current evaluation techniques. To thoroughly examine the effectiveness of existing benchmarks, we introduce a new metric called information coverage (IC), which quantifies the proportion of the input context necessary for answering queries. Our findings indicate that current benchmarks exhibit low IC; although the input context may be extensive, the actual usable context is often limited. To address this, we present ETHIC, a novel benchmark designed to assess LLMs’ ability to leverage the entire context. Our benchmark comprises 1,986 test instances spanning four long-context tasks with high IC scores in the domains of books, debates, medicine, and law. Our evaluations reveal significant performance drops in contemporary LLMs, highlighting a critical challenge in managing long contexts. Our benchmark is available at https://github.com/dmis-lab/ETHIC.
Search
Fix author
Co-authors
- Jaewoo Kang 2
- Taewhoo Lee 2
- Chanwoong Yoon 2
- Hyeon Hwang 1
- Hyeongsoon Hwang 1
- show all...