SongEun Lee


2023

pdf
Towards Diverse and Effective Question-Answer Pair Generation from Children Storybooks
Sugyeong Eo | Hyeonseok Moon | Jinsung Kim | Yuna Hur | Jeongwook Kim | SongEun Lee | Changwoo Chun | Sungsoo Park | Heuiseok Lim
Findings of the Association for Computational Linguistics: ACL 2023

Recent advances in QA pair generation (QAG) have raised interest in applying this technique to the educational field. However, the diversity of QA types remains a challenge despite its contributions to comprehensive learning and assessment of children. In this paper, we propose a QAG framework that enhances QA type diversity by producing different interrogative sentences and implicit/explicit answers. Our framework comprises a QFS-based answer generator, an iterative QA generator, and a relevancy-aware ranker. The two generators aim to expand the number of candidates while covering various types. The ranker trained on the in-context negative samples clarifies the top-N outputs based on the ranking score. Extensive evaluations and detailed analyses demonstrate that our approach outperforms previous state-of-the-art results by significant margins, achieving improved diversity and quality. Our task-oriented processes are consistent with real-world demand, which highlights our system’s high applicability.

pdf
CReTIHC: Designing Causal Reasoning Tasks about Temporal Interventions and Hallucinated Confoundings
Changwoo Chun | SongEun Lee | Jaehyung Seo | Heuiseok Lim
Findings of the Association for Computational Linguistics: EMNLP 2023

Large language models (LLMs) have demonstrated impressive capabilities in natural language processing. However, their ability to establish causal relationships, particularly in the context of temporal interventions and language hallucinations, remains challenging. This paper presents CReTIHC, a novel dataset designed to test and enhance the causal reasoning abilities of LLMs. The dataset is constructed using a unique approach that incorporates elements of verbal hallucinations and temporal interventions through the reengineering of existing causal inference datasets. This transformation creates complex scenarios that push LLMs to critically evaluate the information presented and identify cause-and-effect relationships. The CReTIHC dataset serves as a pioneering tool for improving LLM’s causal inference capabilities, paving the way for a more nuanced understanding of causal relationships in natural language processing (NLP) tasks. The whole dataset is publicly accessible at: (https://github.com/ChangwooChun/CReTIHC)