Sukru Samet Dindar
2025
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding
Xilin Jiang
|
Sukru Samet Dindar
|
Vishal Choudhari
|
Stephan Bickel
|
Ashesh Mehta
|
Guy M McKhann
|
Daniel Friedman
|
Adeen Flinker
|
Nima Mesgarani
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human auditory perception is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce intention-informed auditory scene understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios, with both objective and subjective ratings showing improved alignment with listener intention. By taking a first step toward intention-aware auditory AI, this work explores a new paradigm where listener perception informs machine listening, paving the way for future listener-centered auditory systems. Demo available.
XCOMPS: A Multilingual Benchmark of Conceptual Minimal Pairs
Linyang He
|
Ercong Nie
|
Sukru Samet Dindar
|
Arsalan Firoozi
|
Van Nguyen
|
Corentin Puffay
|
Riki Shimizu
|
Haotian Ye
|
Jonathan Brennan
|
Helmut Schmid
|
Hinrich Schuetze
|
Nima Mesgarani
Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
In this work, we introduce XCOMPS, a multilingual conceptual minimal pair dataset that covers 17 languages.Using this dataset, we evaluate LLMs’ multilingual conceptual understanding through metalinguistic prompting, direct probability measurement, and neurolinguistic probing. We find that: 1) LLMs exhibit weaker conceptual understanding for low-resource languages, and accuracy varies across languages despite being tested on the same concept sets. 2) LLMs excel at distinguishing concept-property pairs that are visibly different but exhibit a marked performance drop when negative pairs share subtle semantic similarities. 3) More morphologically complex languages yield lower concept understanding scores and require deeper layers for conceptual reasoning.