Recent advances in vision-language models (VLMs) have enabled accurate image-based geolocation, raising serious concerns about location privacy risks in everyday social media posts. Yet, a systematic evaluation of such risks is still lacking: existing benchmarks show coarse granularity, linguistic bias, and a neglect of multimodal privacy risks. To address these gaps, we introduce KoreaGEO, the first fine-grained, multimodal, and privacy-aware benchmark for geolocation, built on Korean street views. The benchmark covers four socio-spatial clusters and nine place types with rich contextual annotations and two captioning styles that simulate real-world privacy exposure. To evaluate mainstream VLMs, we design a three-path protocol spanning image-only, functional-caption, and high-risk-caption inputs, enabling systematic analysis of localization accuracy, spatial bias, and reasoning behavior. Results show that input modality exerts a stronger influence on localization precision and privacy exposure than model scale or architecture, with high-risk captions substantially boosting accuracy. Moreover, they highlight structural prediction biases toward core cities.
Figurative language conveys stance, emotion, and social nuance, making its appropriate use essential in dialogue. While large language models (LLMs) often succeed in recognizing figurative expressions at the sentence level, their ability to use them coherently in conversation remains uncertain. We introduce FLUID QA, the first multilingual benchmark that evaluates figurative usage in dialogue across English, Korean, and Chinese. Each item embeds figurative choices into multi-turn contexts. To support interpretation, we include FLUTE-bi, a sentence-level diagnostic task. Results reveal a persistent gap: models that perform well on FLUTE-bi frequently fail on FLUID QA, especially in sarcasm and metaphor. These errors reflect systematic rhetorical confusion and limited discourse reasoning. FLUID QA provides a scalable framework for assessing usage-level figurative competence across languages.