Yanxuan Yu


2025

pdf bib
MT2ST: Adaptive Multi-Task to Single-Task Learning
Dong Liu | Yanxuan Yu
Proceedings of the 1st Workshop on Multimodal Augmented Generation via Multimodal Retrieval (MAGMaR 2025)

We propose MT2ST, a general and efficient framework for accelerating multi-task training by progressively transitioning to single-task optimization. Unlike conventional multi-task learning (MTL) or single-task fine-tuning (STL), MT2ST dynamically adjusts the training focus via two complementary strategies: Diminish, which gradually down-weights auxiliary losses, and Switch, which explicitly switches to the primary task at a scheduled point. We demonstrate the effectiveness of MT2ST across three key paradigms: representation learning, transformers, and diffusion models, covering both unimodal (text/image) and multimodal (vision-language) tasks. Extensive experiments show that MT2ST significantly improves training efficiency—achieving up to 56% FLOPs compression—while maintaining or surpassing task performance. These results suggest MT2ST as a general-purpose solution for scalable and adaptive multi-task training. Although this work is general-purpose, it is especially suitable for multimodal settings such as VQA or vision-language retrieval, where auxiliary pretraining (e.g., masked language modeling or contrastive learning) often diverges from final objectives. We include a VQA case study and outline its efficiency for multimodal retrieval.

pdf bib
HSGM: Hierarchical Segment-Graph Memory for Scalable Long-Text Semantics
Dong Liu | Yanxuan Yu
Proceedings of the 14th Joint Conference on Lexical and Computational Semantics (*SEM 2025)

Semantic parsing of long documents remains challenging due to quadratic growth in pairwise composition and memory requirements. We introduce Hierarchical Segment-Graph Memory (HSGM), a novel framework that decomposes an input of length N into M meaningful segments, constructs Local Semantic Graphs on each segment, and extracts compact summary nodes to form a Global Graph Memory. HSGM supports incremental updates—only newly arrived segments incur local graph construction and summary-node integration—while Hierarchical Query Processing locates relevant segments via top-K retrieval over summary nodes and then performs fine-grained reasoning within their local graphs.Theoretically, HSGM reduces worst-case complexity from O(N2) to O\bigl(N\,k + (N/k)2\bigr),with segment size k ≪ N, and we derive Frobenius-norm bounds on the approximation error introduced by node summarization and sparsification thresholds. Empirically, on three benchmarks—long-document AMR parsing, segment-level semantic role labeling (OntoNotes), and legal event extraction—HSGM achieves 2–4× inference speedup, >60% reduction in peak memory, and ≥95% of baseline accuracy. Our approach unlocks scalable, accurate semantic modeling for ultra-long texts, enabling real-time and resource-constrained NLP applications.