This paper presents Adept-SQL, a domain-adapted Text2SQL system that addresses critical deployment challenges in professional fields. While modern LLM-based solutions excel on academic benchmarks, we identify three persistent limitations in industrial application: domain-specific knowledge barriers, the schemas complexity in the real world, and the prohibitive computational costs of large LLMs. Our framework introduces two key innovations: a three-stage grounding mechanism combining dynamic terminology expansion, focused schema alignment, and historical query retrieval; coupled with a hybrid prompting architecture that decomposes SQL generation into schema-aware hinting, term disambiguation, and few-shot example incorporation phases. This approach enables efficient execution using smaller open-source LLMs while maintaining semantic precision. Deployed in petroleum engineering domains, our system achieves 97% execution accuracy on real-world databases, demonstrating 49% absolute improvement over SOTA baselines. We release implementation code to advance research in professional Text2SQL systems.
Nowadays, research of Text Classification (TC) based on graph neural networks (GNNs) is on the rise. Both inductive methods and transductive methods have made significant progress. For transductive methods, the semantic interaction between texts plays a crucial role in the learning of effective text representations. However, it is difficult to perform inductive learning while modeling interactions between texts on the graph. To give a universal solution, we propose the graph neural network based on pre-trained semantic interaction called PaSIG. Firstly, we construct a text-word heterogeneity graph and design an asymmetric structure to ensure one-way message passing from words to the test texts. Meanwhile, we use the context representation capability of the pre-trained language model to construct node features that contain classification semantic information. Afterward, we explore the adaptative aggregation methods with a gated fusion mechanism. Extensive experiments on five datasets have shown the effectiveness of PaSIG, with the accuracy exceeding the baseline by 2.7% on average. While achieving state-of-the-art performance, we have also taken measures of subgraph sampling and intermediate state preservation to achieve fast inference.
Large Action models are essential for enabling autonomous agents to perform complex tasks. However, training such models remains challenging due to the diversity of agent environments and the complexity of noisy agentic data. Existing infrastructure offers limited support for scalable, agent-specific fine-tuning and standardized agent data processing. We introduce ActionStudio, a lightweight and extensible data and training framework designed for large action models. ActionStudio unifies diverse agent trajectories using our proposed Unified Format 2.0, supports a range of training workflows with optimized multi-node distributed setup, and integrates robust preprocessing and real-time verification tools. ActionStudio demonstrates up to 9× higher throughput compared to existing agentic training frameworks, and our trained models yield top performances across public and realistic agent benchmarks. To support the broader research community, we open-source the ActionStudio framework and release actionstudio-98k, a curated dataset of 98k high-quality trajectories.
The rapid adoption of Large Language Models (LLMs) as intelligent agents has underscored the necessity for robust evaluation frameworks capable of assessing agent performance in realistic, interactive environments. Existing evaluation methodologies often suffer from limitations such as static task benchmarks, limited scope, and inadequate integration with practical applications. In response, we introduce MCPEval, an open-source, Model Context Protocol (MCP)-based evaluation framework specifically tailored for comprehensive and systematic assessment of LLM-powered agents. MCPEval standardizes evaluations across diverse domains through automated task generation and verification, supports multiple performance metrics, and integrates seamlessly with native agent capabilities. We empirically validate the effectiveness of MCPEval across five distinct real-world domains, highlighting significant variations in performance across various LLM architectures and prompting strategies. Our results illustrate the framework’s capacity to uncover nuanced performance patterns and identify domain-specific strengths and weaknesses, providing valuable insights beyond traditional binary success metrics. We publicly release MCPEval to foster reproducible research and promote standardized evaluation practices within the LLM agent community.