Yun Joon Soh


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
You Only Use Reactive Attention Slice When Retrieving From Long Context
Yun Joon Soh | Hanxian Huang | Yuandong Tian | Jishen Zhao
Findings of the Association for Computational Linguistics: EMNLP 2025

Retrieval-Augmented Generation is a powerful method for enhancing language models (LMs), but existing retrieval techniques are limited.Embedding-based methods are often inaccurate due to their reliance on lexical similarity, while neural retrievers are computationally expensive to train.To overcome these issues, we introduce You Only Use Reactive Attention slice (YOURA), a training-free and fine-tuning-free attention-based retrieval technique. When retrieving, YOURA uses a novel reaction score heuristic, which quantifies how an LM’s self-attention “reacts” to a user query. We also propose a sentence extraction algorithm to efficiently preprocess the context.Evaluations on three open-source LMs using the LongBench and BABILong datasets show YOURA’s effectiveness. Our framework improves QA task accuracy by up to 15% and inference throughput by up to 31% compared to embedding-based retrieval.