Minseo Kim

2025

Recent advances in large language models (LLMs) have drawn attention for their potential to automate and optimize processes across various sectors.However, the adoption of LLMs in the plant construction industry remains limited, mainly due to its highly specialized nature and the lack of resources for domain-specific training and evaluation.In this work, we propose ENGinius, the first LLM designed for plant construction engineering.We present procedures for data construction and model training, along with the first benchmarks tailored to this underrepresented domain.We show that ENGinius delivers optimized responses to plant engineers by leveraging enriched domain knowledge.We also demonstrate its practical impact and use cases, such as technical document processing and multilingual communication.

2024

pdf bib abs
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
Jiwan Chung | Sungjae Lee | Minseo Kim | Seungju Han | Ashkan Yousefpour | Jack Hessel | Youngjae Yu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing

Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by human audiences, we ask: are today’s AI capable of similar understanding?We present VisArgs, a dataset of 1,611 images annotated with 5,112 visual premises (with regions), 5,574 commonsense premises, and reasoning trees connecting them into structured arguments. We propose three tasks for evaluating visual argument understanding: premise localization, premise identification, and conclusion deduction.Experiments show that 1) machines struggle to capture visual cues: GPT-4-O achieved 78.5% accuracy, while humans reached 98.0%. Models also performed 19.5% worse when distinguishing between irrelevant objects within the image compared to external objects. 2) Providing relevant visual premises improved model performance significantly.

Co-authors

Maro Na 1

Venues

acl1
emnlp1

Fix author