Yuhang Liu

Papers on this page may belong to the following people: Yuhang Liu, Yuhang Liu

2026

Computer-aided design (CAD) is crucial in prototyping complex 3D objects through precise geometric modeling. In practical design workflows, designers manually define assembly sequences for individual CAD parts, a process that is both time-consuming and expertise-intensive. To address this challenge, we formulate CAD assembly as a parametric action prediction task: given a reference design image and disassembled parts, the model predicts 6-DoF transformations (, actions) to progressively assemble each part. This paradigm enables multimodal large language models (MLLMs) to solve the task through autoregressive action generation. While recent MLLMs demonstrate promising spatial reasoning, they struggle with fine-grained geometric structure understanding and physical collision avoidance during assembly. In this paper, we propose CADMate, an MLLM-based framework for sequential CAD assembly action generation. Our training strategy comprises three stages: (i) CAD domain adaptation for spatial geometry and position understanding, (ii) supervised fine-tuning with geometric chain-of-thought (CoT) reasoning for action generation, and (iii) reinforcement learning with spatial-physical rewards jointly optimize spatial accuracy and collision avoidance. Additionally, we also construct CADBuilder dataset, comprising over 45K CAD assemblies with annotated action sequences. Our experiments demonstrate that CADMate significantly outperforms existing prominent MLLMs (, GPT-5), showing great potential in design applications.

2025

pdf bib abs

Document-level Simplification and Illustration Generation Multimodal Coherence
Yuhang Liu | Mo Zhang | Zhaoyi Cheng | Sarah Ebling
Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)

We present a novel method for document-level text simplification and automatic illustration generation aimed at enhancing information accessibility for individuals with cognitive impairments. While prior research has primarily focused on sentence- or paragraph-level simplification and text-to-image generation for narrative contexts this work addresses the unique challenges of simplifying long-form documents and generating semantically aligned visuals. The pipeline consists of three stages (1) discourse-aware segmentation using large language models (2) visually grounded description generation via abstraction and (3) controlled image synthesis using state-of-the-art diffusion models including DALLE 3 and FLUX1-dev. We further incorporate stylistic constraints to ensure visual coherence and we conduct a human evaluation measuring comprehension semantic alignment and visual clarity. Experimental results demonstrate that our method effectively combines simplified text and visual content with generated illustrations enhancing textual accessibility.

2024

Large Language Models (LLMs) demonstrate significant value in domain-specific applications, benefiting from their fundamental capabilities. Nevertheless, it is still unclear which fundamental capabilities contribute to success in specific domains. Moreover, the existing benchmark-based evaluation cannot effectively reflect the performance of real-world applications. In this survey, we review recent advances of LLMs in domain applications, aiming to summarize the fundamental capabilities and their collaboration. Furthermore, we establish connections between fundamental capabilities and specific domains, evaluating the varying importance of different capabilities. Based on our findings, we propose a reliable strategy for domains to choose more robust backbone LLMs for real-world applications.

2023

pdf bib abs

Graph vs. Sequence: An Empirical Study on Knowledge Forms for Knowledge-Grounded Dialogue
Yizhe Yang | Heyan Huang | Yuhang Liu | Yang Gao
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Knowledge-grounded dialogue is a task of gener- ating an informative response based on both the dialogue history and external knowledge source. In general, there are two forms of knowledge: manu- ally annotated knowledge graphs and knowledge text from website. From various evaluation viewpoints, each type of knowledge has advantages and downsides. To further distinguish the principles and determinants from the intricate factors, we conduct a thorough experiment and study on the task to answer three essential questions. The ques- tions involve the choice of appropriate knowledge form, the degree of mutual effects between knowl- edge and the model selection, and the few-shot performance of knowledge. Supported by statistical shreds of evidence, we offer conclusive solutions and sensible suggestions for directions and standards of future research.