Seolhee Lee
2025
SPeCtrum: A Grounded Framework for Multidimensional Identity Representation in LLM-Based Agent
Keyeun Lee
|
Seo Hyeong Kim
|
Seolhee Lee
|
Jinsu Eun
|
Yena Ko
|
Hayeon Jeon
|
Esther Hehsun Kim
|
Seonghye Cho
|
Soeun Yang
|
Eun-mee Kim
|
Hajin Lim
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Existing methods for simulating individual identities often oversimplify human complexity, which may lead to incomplete or flattened representations. To address this, we introduce SPeCtrum, a grounded framework for constructing authentic LLM agent personas by incorporating an individual’s multidimensional self-concept. SPeCtrum integrates three core components: Social Identity (S), Personal Identity (P), and Personal Life Context (C), each contributing distinct yet interconnected aspects of identity. To evaluate SPeCtrum’s effectiveness in identity representation, we conducted automated and human evaluations. Automated evaluations using popular drama characters showed that Personal Life Context (C)—derived from short essays on preferences and daily routines—modeled characters’ identities more effectively than Social Identity (S) and Personal Identity (P) alone and performed comparably to the full SPC combination. In contrast, human evaluations involving real-world individuals found that the full SPC combination provided a more comprehensive self-concept representation than C alone. Our findings suggest that while C alone may suffice for basic identity simulation, integrating S, P, and C enhances the authenticity and accuracy of real-world identity representation. Overall, SPeCtrum offers a structured approach for simulating individuals in LLM agents, enabling more personalized human-AI interactions and improving the realism of simulation-based behavioral studies.
2024
PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text
Hayeon Bang
|
Eunjin Choi
|
Megan Finch
|
Seungheon Doh
|
Seolhee Lee
|
Gyeong-Hoon Lee
|
Juhan Nam
Proceedings of the 3rd Workshop on NLP for Music and Audio (NLP4MusA)
While piano music has become a significant area of study in Music Information Retrieval (MIR), there is a notable lack of datasets for piano solo music with text labels. To address this gap, we present PIAST (PIano dataset with Audio, Symbolic, and Text), a piano music dataset. Utilizing a piano-specific taxonomy of semantic tags, we collected 9,673 tracks from YouTube and added human annotations for 2,023 tracks by music experts, resulting in two subsets: PIAST-YT and PIAST-AT. Both include audio, text, tag annotations, and transcribed MIDI utilizing state-of-the-art piano transcription and beat tracking models. Among many possible tasks with the multimodal dataset, we conduct music tagging and retrieval using both audio and MIDI data and report baseline performances to demonstrate its potential as a valuable resource for MIR research.
Search
Fix data
Co-authors
- Hayeon Bang 1
- Seonghye Cho 1
- Eunjin Choi 1
- Seungheon Doh 1
- Jinsu Eun 1
- show all...