Tianyi Ma
Other people with similar names: Tianyi Ma, Tianyi Ma
Unverified author pages with similar names: Tianyi Ma
2026
AgentRouter: A Knowledge-Graph-Guided LLM Router for Collaborative Multi-Agent Question Answering
Zheyuan Zhang | Kaiwen Shi | Zhengqing Yuan | Zehong Wang | Tianyi Ma | Keerthiram Murugesan | Vincent Galassi | Chuxu Zhang | Yanfang Ye
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zheyuan Zhang | Kaiwen Shi | Zhengqing Yuan | Zehong Wang | Tianyi Ma | Keerthiram Murugesan | Vincent Galassi | Chuxu Zhang | Yanfang Ye
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Large language models (LLMs) and agent-based frameworks have advanced rapidly, enabling diverse applications. Yet, with the proliferation of models and agentic strategies, practitioners face substantial uncertainty in selecting the best configuration for a downstream task. Prior studies show that different agents and backbones exhibit complementary strengths, and that larger models are not always superior, underscoring the need for adaptive routing mechanisms. Existing approaches to agent routing, however, often emphasize cost efficiency while overlooking the fine-grained contextual and relational structure inherent in QA tasks. In this paper, we propose AgentRouter, a framework that formulates multi-agent QA as a knowledge-graph–guided routing problem supervised by empirical performance signals. Specifically, we convert QA instance into a heterogeneous knowledge graph that jointly encodes queries, contextual entities, and agents, and then train a heterogeneous graph neural network (GNN) to propagate information across node types and produce task-aware routing distributions over agents. By leveraging soft supervision and weighted aggregation of agent outputs, AgentRouter learns principled collaboration schemes that capture the complementary strengths of diverse agents. Extensive experiments demonstrate that our framework consistently outperforms single-agent and ensemble baselines, while generalizing across benchmarks and LLM backbones. These results highlight the effectiveness and robustness of graph-supervised multi-agent routing for question answering. Our code repo is available at https://anonymous.4open.science/r/AgentRouter.
2025
Can LLMs Convert Graphs to Text-Attributed Graphs?
Zehong Wang | Sidney Liu | Zheyuan Zhang | Tianyi Ma | Chuxu Zhang | Yanfang Ye
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Zehong Wang | Sidney Liu | Zheyuan Zhang | Tianyi Ma | Chuxu Zhang | Yanfang Ye
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)
Graphs are ubiquitous structures found in numerous real-world applications, such as drug discovery, recommender systems, and social network analysis. To model graph-structured data, graph neural networks (GNNs) have become a popular tool. However, existing GNN architectures encounter challenges in cross-graph learning where multiple graphs have different feature spaces. To address this, recent approaches introduce text-attributed graphs (TAGs), where each node is associated with a textual description, which can be projected into a unified feature space using textual encoders. While promising, this method relies heavily on the availability of text-attributed graph data, which is difficult to obtain in practice. To bridge this gap, we propose a novel method named Topology-Aware Node description Synthesis (TANS), leveraging large language models (LLMs) to convert existing graphs into text-attributed graphs. The key idea is to integrate topological information into LLMs to explain how graph topology influences node semantics. We evaluate our TANS on text-rich, text-limited, and text-free graphs, demonstrating its applicability. Notably, on text-free graphs, our method significantly outperforms existing approaches that manually design node features, showcasing the potential of LLMs for preprocessing graph-structured data in the absence of textual information. The code and data are available at https://github.com/Zehong-Wang/TANS.
PsyScam: A Benchmark for Psychological Techniques in Real-World Scams
Shang Ma | Tianyi Ma | Jiahao Liu | Wei Song | Zhenkai Liang | Xusheng Xiao | Yanfang Ye
Findings of the Association for Computational Linguistics: EMNLP 2025
Shang Ma | Tianyi Ma | Jiahao Liu | Wei Song | Zhenkai Liang | Xusheng Xiao | Yanfang Ye
Findings of the Association for Computational Linguistics: EMNLP 2025
Over the years, online scams have grown dramatically,with nearly 50% of global consumersencountering scam attempts each week.These scams cause not only significant financiallosses to individuals and businesses, butalso lasting psychological trauma, largely dueto scammers’ strategic employment of psychologicaltechniques (PTs) to manipulate victims.Meanwhile, scammers continually evolve theirtactics by leveraging advances in Large LanguageModels (LLMs) to generate diverse scamvariants that easily bypass existing defenses.To address this pressing problem, we introducePsyScam, a benchmark designed to systematicallycapture the PTs employed in real-worldscam reports, and investigate how LLMs canbe utilized to generate variants of scams basedon the PTs and the contexts provided by thesescams. Specifically, we collect a wide range ofscam reports and ground its annotations of employedPTs in well-established cognitive andpsychological theories. We further demonstrateLLMs’ capabilities in generating through twodownstream tasks: scam completion, and scamaugmentation. Experimental results show thatPsyScam presents significant challenges toexisting models in both detecting and generatingscam content based on the PTs used byreal-world scammers. Our code and dataset areavailable.
NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning
Zheyuan Zhang | Yiyang Li | Nhi Ha Lan Le | Zehong Wang | Tianyi Ma | Vincent Galassi | Keerthiram Murugesan | Nuno Moniz | Werner Geyer | Nitesh V Chawla | Chuxu Zhang | Yanfang Ye
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Zheyuan Zhang | Yiyang Li | Nhi Ha Lan Le | Zehong Wang | Tianyi Ma | Vincent Galassi | Keerthiram Murugesan | Nuno Moniz | Werner Geyer | Nitesh V Chawla | Chuxu Zhang | Yanfang Ye
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Diet plays a critical role in human health, yet tailoring dietary reasoning to individual health conditions remains a major challenge. Nutrition Question Answering (QA) has emerged as a popular method for addressing this problem. However, current research faces two critical limitations. On one hand, the absence of datasets involving user-specific medical information severely limits personalization. This challenge is further compounded by the wide variability in individual health needs. On the other hand, while large language models (LLMs), a popular solution for this task, demonstrate strong reasoning abilities, they struggle with the domain-specific complexities of personalized healthy dietary reasoning, and existing benchmarks fail to capture these challenges. To address these gaps, we introduce the Nutritional Graph Question Answering (NGQA) benchmark, the first graph question answering dataset designed for personalized nutritional health reasoning. NGQA leverages data from the National Health and Nutrition Examination Survey (NHANES) and the Food and Nutrient Database for Dietary Studies (FNDDS) to evaluate whether a food is healthy for a specific user, supported by explanations of the key contributing nutrients. The benchmark incorporates three question complexity settings and evaluates reasoning across three downstream tasks. Extensive experiments with LLM backbones and baseline models demonstrate that the NGQA benchmark effectively challenges existing models. In sum, NGQA addresses a critical real-world problem while advancing GraphQA research with a novel domain-specific benchmark. Our codebase and dataset are available here.
LLM-Empowered Class Imbalanced Graph Prompt Learning for Online Drug Trafficking Detection
Tianyi Ma | Yiyue Qian | Zehong Wang | Zheyuan Zhang | Chuxu Zhang | Yanfang Ye
Findings of the Association for Computational Linguistics: ACL 2025
Tianyi Ma | Yiyue Qian | Zehong Wang | Zheyuan Zhang | Chuxu Zhang | Yanfang Ye
Findings of the Association for Computational Linguistics: ACL 2025
As the market for illicit drugs remains extremely profitable, major online platforms have become direct-to-consumer intermediaries for illicit drug trafficking participants. These online activities raise significant social concerns that require immediate actions. Existing approaches to combat this challenge are generally impractical due to the scarcity of labeled samples and imbalance of classes in real-world applications. To this end, we propose a novel Large Language Model-empowered Heterogeneous Graph Prompt Learning framework for illicit Drug Trafficking detection, called LLM-HetGDT that leverages LLM to facilitate heterogeneous graph neural networks (HGNNs) to effectively identify minority classes, i.e., drug trafficking participants, in the class-imbalanced scenarios. Specifically, we first pre-train HGNN over a contrastive pretext task to capture the inherent node and structure information over an unlabeled drug trafficking heterogeneous graph (HG). Afterward, to alleviate the class-imbalanced issue, we leverage LLMs to augment the HG by generating high-quality synthetic user nodes in the minority classes. Then, we fine-tune the soft prompts on the augmented HG to capture the important information in the minority classes for the downstream drug trafficking detection task. To comprehensively study online illicit drug trafficking activities, we collect a new HG dataset over Twitter, called Twitter-HetDrug. Extensive experiments on this dataset demonstrate the effectiveness, efficiency, and applicability of our proposed method by comparing it with state-of-the-art baseline methods. Our source code is available at https://github.com/GraphResearcher/LLM-HetGDT.