Xiaonan Wang
2025
AI Knows Where You Are: Exposure, Bias, and Inference in Multimodal Geolocation with KoreaGEO
Xiaonan Wang
|
Bo Shao
|
Hansaem Kim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Recent advances in vision-language models (VLMs) have enabled accurate image-based geolocation, raising serious concerns about location privacy risks in everyday social media posts. Yet, a systematic evaluation of such risks is still lacking: existing benchmarks show coarse granularity, linguistic bias, and a neglect of multimodal privacy risks. To address these gaps, we introduce KoreaGEO, the first fine-grained, multimodal, and privacy-aware benchmark for geolocation, built on Korean street views. The benchmark covers four socio-spatial clusters and nine place types with rich contextual annotations and two captioning styles that simulate real-world privacy exposure. To evaluate mainstream VLMs, we design a three-path protocol spanning image-only, functional-caption, and high-risk-caption inputs, enabling systematic analysis of localization accuracy, spatial bias, and reasoning behavior. Results show that input modality exerts a stronger influence on localization precision and privacy exposure than model scale or architecture, with high-risk captions substantially boosting accuracy. Moreover, they highlight structural prediction biases toward core cities.
FLUID QA: A Multilingual Benchmark for Figurative Language Usage in Dialogue across English, Chinese, and Korean
Seoyoon Park
|
Hyeji Choi
|
Minseon Kim
|
Subin An
|
Xiaonan Wang
|
Gyuri Choi
|
Hansaem Kim
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Figurative language conveys stance, emotion, and social nuance, making its appropriate use essential in dialogue. While large language models (LLMs) often succeed in recognizing figurative expressions at the sentence level, their ability to use them coherently in conversation remains uncertain. We introduce FLUID QA, the first multilingual benchmark that evaluates figurative usage in dialogue across English, Korean, and Chinese. Each item embeds figurative choices into multi-turn contexts. To support interpretation, we include FLUTE-bi, a sentence-level diagnostic task. Results reveal a persistent gap: models that perform well on FLUTE-bi frequently fail on FLUID QA, especially in sarcasm and metaphor. These errors reflect systematic rhetorical confusion and limited discourse reasoning. FLUID QA provides a scalable framework for assessing usage-level figurative competence across languages.
2024
KULTURE Bench: A Benchmark for Assessing Language Model in Korean Cultural Context
Xiaonan Wang
|
Jinyoung Yeo
|
Joon-Ho Lim
|
Hansaem Kim
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation