Hye-young Paik

2026

HiSA: Hierarchical State Abstraction for Scalable GUI Agents
Weiming Li | Hye-young Paik | Yulei Sui
Findings of the Association for Computational Linguistics: ACL 2026

Multimodal GUI agents generally operate on raw visual and textual observations, which creates a fundamental scalability challenge. While current state-of-the-art frameworks predominantly rely on inference-intensive test-time scaling or the accumulation of unbounded raw logs to maintain task coherence, we attribute the underlying bottleneck to insufficient state abstraction.To address this, we propose HiSA, a hierarchical state abstraction approach that actively restructures knowledge rather than passively retaining historical information by organizing raw histories into a three-level hierarchy of abstracted steps, refined contexts, and induced patterns.By synthesizing high-dimensional observations into compact semantic states, HiSA decouples reasoning efficacy from context length, enabling precise and scalable decision-making as interaction histories grow.When evaluating using Spider2-V, our approach establishes a new state-of-the-art, achieving a 40.58% success rate while reducing token consumption by 69.85% and monetary costs by 55.10% compared to the best-performing baseline.

pdf bib abs

Evaluating Customized vs. Generalist Transformer-based Models for Legal Contract Classification
Amrita Singh | H. Suhan Karaca | Aditya Joshi | Hye-young Paik | Jiaojiao Jiang
Proceedings of the Second Workshop on Customizable NLP: Progress and Challenges in Customizing NLP for a Domain, Application, Group, or Individual (CustomNLP4U)

Despite advances in legal NLP, no comprehensive evaluation of Transformer-based models customized for legal tasks (referred to as ’legal-specific’ models in this paper) exists for contract classification tasks. To address this gap, we present an evaluation of 13 legal-specific transformer-based models on 3 English-language contract classification tasks and compare them with 9 generalist models. The results show that legal-specific models consistently outperform generalist models, especially on tasks requiring nuanced legal understanding. They also help reduce misclassification of rare classes in imbalanced datasets. Legal-BERT and Contracts-BERT establish new SOTAs on two of the three tasks, despite having 69% fewer parameters than the best-performing generalist models. We also identify CaseLaw-BERT and LexLM as strong additional baselines for contract classification. Our results highlight the shortcomings of generalist models, emphasizing the need for domain-specific customization, particularly in the context of legal applications.

Co-authors

Yulei Sui 1

Venues

Fix author