Xinyue Wang

2025

Event forecasting requires modeling historical event data to predict future events, and achieving accurate predictions depends on effectively capturing the relevant historical information that aids forecasting. Most existing methods focus on entities and structural dependencies to capture historical clues but often overlook implicitly relevant information. This limitation arises from overlooking event semantics and deeper factual associations that are not explicitly connected in the graph structure but are nonetheless critical for accurate forecasting. To address this, we propose a dual-criteria constraint strategy that leverages event semantics for relevance modeling and incorporates a self-supervised semantic filter based on factual event associations to capture implicitly relevant historical information. Building on this strategy, our method, termed ITHI (Integrating Three types of Historical Information), combines sequential event information, periodically repeated event information, and relevant historical information to achieve context-aware event forecasting. We evaluated the proposed ITHI method on three public benchmark datasets, achieving state-of-the-art performance and significantly outperforming existing approaches. Additionally, we validated its effectiveness on two structured temporal knowledge graph forecasting dataset.

Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks yet still are vulnerable to external threats, particularly LLM Denial-of-Service (LLM-DoS) attacks. Specifically, LLM-DoS attacks aim to exhaust computational resources and block services. However, existing studies predominantly focus on white-box attacks, leaving black-box scenarios underexplored. In this paper, we introduce Auto-Generation for LLM-DoS (AutoDoS) attack, an automated algorithm designed for black-box LLMs. AutoDoS constructs the DoS Attack Tree and expands the node coverage to achieve effectiveness under black-box conditions. By transferability-driven iterative optimization, AutoDoS could work across different models in one prompt.Furthermore, we reveal that embedding the Length Trojan allows AutoDoS to bypass existing defenses more effectively.Experimental results show that AutoDoS significantly amplifies service response latency by over 250×↑, leading to severe resource consumption in terms of GPU utilization and memory usage. Our work provides a new perspective on LLM-DoS attacks and security defenses.

Large Language Models (LLMs), due to substantial computational requirements, are vulnerable to resource consumption attacks, which can severely degrade server performance or even cause crashes, as demonstrated by denial-of-service (DoS) attacks designed for LLMs. However, existing works lack mitigation strategies against such threats, resulting in unresolved security risks for real-world LLM deployments. To this end, we propose the Pluggable and Dynamic DoS-Defense Framework (PD³F), which employs a two-stage approach to defend against resource consumption attacks from both the input and output sides. On the input side, we propose the Resource Index to guide Dynamic Request Polling Scheduling, thereby reducing computing resource usage induced by malicious prompts under high-concurrency scenarios. On the output side, we introduce the Adaptive End-Based Suppression mechanism, which reduces excessive malicious generation. Experiments across six models demonstrate that PD³F significantly mitigates resource consumption attacks, improving users’ access capacity by up to 500% during adversarial load. PD³F represents a step toward the resilient and resource-aware deployment of LLMs against resource consumption attacks.

pdf bib abs
AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science
Qiuhai Zeng | Claire Jin | Xinyue Wang | Yuhan Zheng | Qunhua Li
Findings of the Association for Computational Linguistics: EMNLP 2025

Large language models (LLMs) are increasingly used to automate data analysis through executable code generation. Yet, data science tasks often admit multiple statistically valid solutions—for example, different modeling strategies—making it critical to understand the reasoning behind analyses, not just their outcomes. While manual review of LLM-generated code can help ensure statistical soundness, it is labor-intensive and requires expertise. A more scalable approach is to evaluate the underlying workflows—the logical plans guiding code generation. However, it remains unclear how to assess whether an LLM-generated workflow supports reproducible implementations.To address this, we present **AIRepr**, an **A**nalyst–**I**nspector framework for automatically evaluating and improving the **repr**oducibility of LLM-generated data analysis workflows. Our framework is grounded in statistical principles and supports scalable, automated assessment. We introduce two novel reproducibility-enhancing prompting strategies and benchmark them against standard prompting across 15 analyst–inspector LLM pairs and 1,032 tasks from three public benchmarks. Our findings show that workflows with higher reproducibility also yield more accurate analyses, and that reproducibility-enhancing prompts substantially improve both metrics. This work provides a foundation for transparent, reliable, and efficient human–AI collaboration in data science. Our code is publicly available: [https://github.com/Anonymous-2025-Repr/LLM-DS-Reproducibility](https://github.com/Anonymous-2025-Repr/LLM-DS-Reproducibility)