2025
pdf
bib
abs
ChronoBias: A Benchmark for Evaluating Temporal Group Bias in the Time-sensitive Knowledge of Large Language Models
Kyungmin Kim
|
Youngbin Choi
|
Hyounghun Kim
|
Dongwoo Kim
|
Sangdon Park
Findings of the Association for Computational Linguistics: EMNLP 2025
In this paper, we propose ChronoBias, a novel benchmark for evaluating time-conditional group bias in the time-sensitive knowledge of large language models (LLMs).Our benchmark is constructed via a template-based semi-automated generation method, balancing the quality-quantity trade-off in existing benchmark curation approaches.For knowledge that changes over time, time-conditional group bias exhibits varying patterns across time intervals, evident in both the best- and worst-performing groups and in the bias metric itself.In addition to parametric knowledge bias–which influences group bias across all time intervals–we identify time-sensitivity bias as an additional factor after a model’s knowledge cutoff, accounting for much of the variation in time-conditional group bias over time.Since both biases are irreducible, retrieval-augmented generation (RAG) can be a promising approach, as it can address post-cutoff knowledge and better leverage pretraining knowledge that is underrepresented in the model parameters.While RAG improves both overall performance and group bias, we observe that the disparate patterns of time-conditional group bias still persist.Therefore, through extensive experiments with various model configurations, we illustrate how accurate and fair RAG-based LLMs should behave and provide actionable guidelines toward constructing such ideal models.
2024
pdf
bib
abs
Selective Perception: Learning Concise State Descriptions for Language Model Actors
Kolby Nottingham
|
Yasaman Razeghi
|
Kyungmin Kim
|
Jb Lanier
|
Pierre Baldi
|
Roy Fox
|
Sameer Singh
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
The latest large language models (LMs) support increasingly longer contexts. While this trend permits using substantial amounts of text with SOTA LMs, requiring these large LMs to process potentially redundant or irrelevant data needlessly increases inference time and cost. To remedy this problem, we propose BLINDER, a method that leverages a small finetuned LM to sample the minimal set of input features that maximizes the performance of a downstream LM. BLINDER trains an LM with a value head to estimate the likelihood of optimal outputs from a downstream LM given an input. We evaluate BLINDER on embodied decision making tasks with notoriously verbose state descriptions: NetHack and robot planning. BLINDER reduces the length of LM actor input by 87% and 99% while improving task success rates by 158% and 54% on NetHack and robot planning respectively which represents substantial inference cost savings while actually increasing performance.
2023
pdf
bib
abs
Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning
Kyuyong Shin
|
Hanock Kwak
|
Wonjae Kim
|
Jisu Jeong
|
Seungjae Jung
|
Kyungmin Kim
|
Jung-Woo Ha
|
Sang-Woo Lee
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Recent studies have proposed unified user modeling frameworks that leverage user behavior data from various applications. Many of them benefit from utilizing users’ behavior sequences as plain texts, representing rich information in any domain or system without losing generality. Hence, a question arises: Can language modeling for user history corpus help improve recommender systems? While its versatile usability has been widely investigated in many domains, its applications to recommender systems still remain underexplored. We show that language modeling applied directly to task-specific user histories achieves excellent results on diverse recommendation tasks. Also, leveraging additional task-agnostic user histories delivers significant performance benefits. We further demonstrate that our approach can provide promising transfer learning capabilities for a broad spectrum of real-world recommender systems, even on unseen domains and services.