Jungwhan Kim
2026
What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models
Dasol Choi | Guijin Son | Hanwool Lee | Minhyuk Kim | Hyunwoo Ko | Teabin Lim | Eungyeol Ahn | Jungwhan Kim | Seunghyeok Hong | Youngsook Song
Findings of the Association for Computational Linguistics: ACL 2026
Dasol Choi | Guijin Son | Hanwool Lee | Minhyuk Kim | Hyunwoo Ko | Teabin Lim | Eungyeol Ahn | Jungwhan Kim | Seunghyeok Hong | Youngsook Song
Findings of the Association for Computational Linguistics: ACL 2026
Current vision-language benchmarks predominantly feature well-structured questions with clear, explicit prompts. However, real user queries are often informal and underspecified. Users naturally leave much unsaid, relying on images to convey context. We introduce HAERAE-Vision, a benchmark of 653 real-world visual questions from Korean online communities (0.76% survival from 86K candidates), each paired with an explicit rewrite, yielding 1,306 query variants in total. Evaluating 39 VLMs, we find that even state-of-the-art models (GPT-5, Gemini 2.5 Pro) achieve under 50% on the original queries. Crucially, query explicitation alone yields 8 to 22 point improvements, with smaller models benefiting most. We further show that even with web search, under-specified queries underperform explicit queries without search, revealing that current retrieval cannot compensate for what users leave unsaid. Our findings demonstrate that a substantial portion of VLM difficulty stem from natural query under-specification instead of model capability, highlighting a critical gap between benchmark evaluation and real-world deployment.
2021
Analysis of Zero-Shot Crosslingual Learning between English and Korean for Named Entity Recognition
Jongin Kim | Nayoung Choi | Seunghyun Lim | Jungwhan Kim | Soojin Chung | Hyunsoo Woo | Min Song | Jinho D. Choi
Proceedings of the 1st Workshop on Multilingual Representation Learning
Jongin Kim | Nayoung Choi | Seunghyun Lim | Jungwhan Kim | Soojin Chung | Hyunsoo Woo | Min Song | Jinho D. Choi
Proceedings of the 1st Workshop on Multilingual Representation Learning
This paper presents a English-Korean parallel dataset that collects 381K news articles where 1,400 of them, comprising 10K sentences, are manually labeled for crosslingual named entity recognition (NER). The annotation guidelines for the two languages are developed in parallel, that yield the inter-annotator agreement scores of 91 and 88% for English and Korean respectively, indicating sublime quality annotation in our dataset. Three types of crosslingual learning approaches, direct model transfer, embedding projection, and annotation projection, are used to develop zero-shot Korean NER models. Our best model gives the F1-score of 51% that is very encouraging, considering the extremely distinct natures of these two languages. This is pioneering work that explores zero-shot cross-lingual learning between English and Korean and provides rich parallel annotation for a core NLP task such as named entity recognition.