Minseo Kim
2025
ENGinius: A Bilingual LLM Optimized for Plant Construction Engineering
Wooseong Lee
|
Minseo Kim
|
Taeil Hur
|
Gyeong Hwan Jang
|
Woncheol Lee
|
Maro Na
|
Taeuk Kim
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)
Recent advances in large language models (LLMs) have drawn attention for their potential to automate and optimize processes across various sectors.However, the adoption of LLMs in the plant construction industry remains limited, mainly due to its highly specialized nature and the lack of resources for domain-specific training and evaluation.In this work, we propose ENGinius, the first LLM designed for plant construction engineering.We present procedures for data construction and model training, along with the first benchmarks tailored to this underrepresented domain.We show that ENGinius delivers optimized responses to plant engineers by leveraging enriched domain knowledge.We also demonstrate its practical impact and use cases, such as technical document processing and multilingual communication.
2024
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding
Jiwan Chung
|
Sungjae Lee
|
Minseo Kim
|
Seungju Han
|
Ashkan Yousefpour
|
Jack Hessel
|
Youngjae Yu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Visual arguments, often used in advertising or social causes, rely on images to persuade viewers to do or believe something. Understanding these arguments requires selective vision: only specific visual stimuli within an image are relevant to the argument, and relevance can only be understood within the context of a broader argumentative structure. While visual arguments are readily appreciated by human audiences, we ask: are today’s AI capable of similar understanding?We present VisArgs, a dataset of 1,611 images annotated with 5,112 visual premises (with regions), 5,574 commonsense premises, and reasoning trees connecting them into structured arguments. We propose three tasks for evaluating visual argument understanding: premise localization, premise identification, and conclusion deduction.Experiments show that 1) machines struggle to capture visual cues: GPT-4-O achieved 78.5% accuracy, while humans reached 98.0%. Models also performed 19.5% worse when distinguishing between irrelevant objects within the image compared to external objects. 2) Providing relevant visual premises improved model performance significantly.
Search
Fix author
Co-authors
- Jiwan Chung 1
- Seungju Han 1
- Jack Hessel 1
- Taeil Hur 1
- Gyeong Hwan Jang 1
- show all...