Jing Yang

Papers on this page may belong to the following people: Jing Yang, Jing Yang, Jing Yang (Campinas)

2026

hermeneutichools at SemEval-2026 Task 4: Multiperspectivity as a Resource for Narrative Similarity Prediction
Max Upravitelev | Veronika Solopova | Jing Yang | Charlott Jakob | Premtim Sahitaj | Ariana Sahitaj | Vera Schmitt
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

Predicting narrative similarity can be under-stood as an inherently interpretive task: differ-ent, equally valid readings of the same text canproduce divergent interpretations and thus dif-ferent similarity judgments, posing a fundamen-tal challenge for semantic evaluation bench-marks that encode a single ground truth. Ratherthan treating this multiperspectivity as a chal-lenge to overcome, we propose to incorporateit in the decision making process of predic-tive systems. To explore this strategy, we cre-ated an ensemble of 31 LLM personas. Theserange from practitioners following interpretiveframeworks to more intuitive, lay-style charac-ters. Our experiments were conducted on theSemEval-2026 Task 4 dataset, where the sys-tem ranked 13th out of 47 teams and achievedan accuracy score of 0.705. Accuracy improveswith ensemble size, consistent with CondorcetJury Theorem-like dynamics under weakenedindependence. Practitioner personas performworse individually but produce less correlatederrors, yielding larger ensemble gains undermajority voting. Our error analysis reveals aconsistent negative association between gender-focused interpretive vocabulary and accuracyacross all persona categories, suggesting ei-ther attention to dimensions not relevant for thebenchmark or valid interpretations absent fromthe ground truth. This finding underscores theneed for evaluation frameworks that accountfor interpretive plurality.

pdf bib abs

Multi-round Vision-Language Model (VLM) Multi-Agent Systems (MAS) offer powerful reasoning capabilities but suffer from prohibitive costs due to static panel designs, where all N agents communicate at every T round. This approach is fundamentally inefficient, as it ignores the context-dependent and diminishing marginal utility of specific agents. To address this, we propose Nash-CredMAS, an economic framework that transforms agent selection into a dynamic resource allocation game. Unlike heuristic routing or one-time pruning, our method operates in two phases: (1) Offline Causal Value Learning, where we employ a doubly-robust (AIPW) estimator to train a context-aware value function from biased interaction logs, effectively learning the true marginal contribution of agents; and (2) Online Dynamic Auctions, where agents bid for communication slots based on their predicted utility. We formulate the inference-time selection as a submodular maximization problem under budget constraints, theoretically guaranteeing a (1 - 1/e)-approximation of the optimal coalition via a greedy strategy. Empirically, Nash-CredMAS achieves state-of-the-art results on challenging benchmarks, including MMMU and V*-Bench, while reducing token consumption by over 25% compared to static baselines. The system naturally converges to an economic equilibrium where agents actively remain silent when their marginal value does not justify the cost.

pdf bib abs

RACC: Regret-Aware Confidence Calibration for Consistent Masked Discrete Diffusion Decoding
Qinglin Zeng | Jusheng Zhang | Jing Yang | Ningyuan Liu | Keze Wang
Findings of the Association for Computational Linguistics: ACL 2026

Masked Discrete Diffusion Models (MDMs) enable parallel generation via iterative refinement. However, we identify a critical decisional mismatch. The MDM architecture is inherently dynamic and capable of sensing context shifts. In contrast, prevailing decoding paradigms remain static and myopic. They treat each denoising step as an isolated snapshot, effectively discarding valuable temporal feedback that signals logical conflicts. To bridge this gap, we propose Regret-Aware Confidence Calibration (RACC). This training-free framework aligns decoding decisions with the model’s latent self-correction capabilities. RACC introduces a momentum anchor to track confidence trajectories. When a token’s probability drops abruptly below its historical trend, the system triggers a "regret" signal. Unlike expensive re-masking or lookahead search, RACC utilizes this signal to proactively demote unstable candidates. Extensive experiments on reasoning benchmarks, such as HumanEval and GSM8K, demonstrate that RACC significantly improves generation consistency. Crucially, RACC achieves these gains with zero additional inference overhead, effectively balancing decoding quality and efficiency.

pdf bib abs

Diffusion Large Language Models (dLLMs) have emerged as a promising non-autoregressive paradigm for text generation, offering parallel decoding and bidirectional context modeling. However, aligning dLLMs with reinforcement learning (RL) remains a significant challenge, as the marginal likelihood of sequences in masked diffusion is typically intractable, rendering standard policy gradient methods unstable or computationally prohibitive. In this work, we propose **Diffusion-Gibbs Alignment (DGA)**, a novel variational framework that reformulates RL for dLLMs as a distribution matching problem. DGA bypasses the explicit computation of log-probabilities by leveraging a learned energy function to model the relative quality of samples. The optimization is decoupled into two stable steps: (1) contrastive energy ranking to capture global reward structures, and (2) weighted diffusion alignment to update the policy via importance sampling. Empirically, DGA establishes a new state-of-the-art across logical reasoning (Sudoku, Countdown), mathematical reasoning (GSM8K, Math500), and code generation (HumanEval, MBPP) benchmarks. DGA offers a novel variational perspective for dLLM alignment, achieving better performance while simultaneously enhancing training speed and memory efficiency.

2025

pdf bib abs

CCG: Rare-Label Prediction via Neural SEM–Driven Causal Game
Yijia Fan | Jusheng Zhang | Kaitong Cai | Jing Yang | Keze Wang
Findings of the Association for Computational Linguistics: EMNLP 2025

Multi-label classification (MLC) faces persistent challenges from label imbalance, spurious correlations, and distribution shifts, especially in rare label prediction. We propose the Causal Cooperative Game (CCG) framework, which models MLC as a multi-player cooperative process. CCG integrates explicit causal discovery via Neural Structural Equation Models, a counterfactual curiosity reward to guide robust feature learning, and a causal invariance loss to ensure generalization across environments, along with targeted rare label enhancement. Extensive experiments on benchmark datasets demonstrate that CCG significantly improves rare label prediction and overall robustness compared to strong baselines. Ablation and qualitative analyses further validate the effectiveness and interpretability of each component. Our work highlights the promise of combining causal inference and cooperative game theory for more robust and interpretable multi-label learning.

pdf bib abs

Digital social media platforms frequently contribute to cognitive-behavioral fixation, a phenomenon in which users exhibit sustained and repetitive engagement with narrow content domains. While cognitive-behavioral fixation has been extensively studied in psychology, methods for computationally detecting and evaluating such fixation remain underexplored. To address this gap, we propose a novel framework for assessing cognitive-behavioral fixation by analyzing users’ multimodal social media engagement patterns. Specifically, we introduce a multimodal topic extraction module and a cognitive-behavioral fixation quantification module that collaboratively enable adaptive, hierarchical, and interpretable assessment of user behavior. Experiments on existing benchmarks and a newly curated multimodal dataset demonstrate the effectiveness of our approach, laying the groundwork for scalable computational analysis of cognitive fixation. All code in this project is publicly available for research purposes at https://github.com/Liskie/cognitive-fixation-evaluation.

2024

pdf bib abs

Rethinking Word-level Adversarial Attack: The Trade-off between Efficiency, Effectiveness, and Imperceptibility
Pengwei Zhan | Jing Yang | He Wang | Chao Zheng | Liming Wang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Neural language models have demonstrated impressive performance in various tasks but remain vulnerable to word-level adversarial attacks. Word-level adversarial attacks can be formulated as a combinatorial optimization problem, and thus, an attack method can be decomposed into search space and search method. Despite the significance of these two components, previous works inadequately distinguish them, which may lead to unfair comparisons and insufficient evaluations. In this paper, to address the inappropriate practices in previous works, we perform thorough ablation studies on the search space, illustrating the substantial influence of search space on attack efficiency, effectiveness, and imperceptibility. Based on the ablation study, we propose two standardized search spaces: the Search Space for ImPerceptibility (SSIP) and Search Space for EffecTiveness (SSET). The reevaluation of eight previous attack methods demonstrates the success of SSIP and SSET in achieving better trade-offs between efficiency, effectiveness, and imperceptibility in different scenarios, offering fair and comprehensive evaluations of previous attack methods and providing potential guidance for future works.

2023

pdf bib abs

Neural language models are vulnerable to word-level adversarial text attacks, which generate adversarial examples by directly substituting discrete input words. Previous search methods for word-level attacks assume that the information in the important words is more influential on prediction than unimportant words. In this paper, motivated by this assumption, we propose a self-supervised regularization method for Similarizing the Influence of Words with Contrastive Learning (SIWCon) that encourages the model to learn sentence representations in which words of varying importance have a more uniform influence on prediction. Experiments show that SIWCon is compatible with various training methods and effectively improves model robustness against various unforeseen adversarial attacks. The effectiveness of SIWCon is also intuitively shown through qualitative analysis and visualization of the loss landscape, sentence representation, and changes in model confidence.

pdf bib abs

Contrastive Learning with Adversarial Examples for Alleviating Pathology of Language Model
Pengwei Zhan | Jing Yang | Xiao Huang | Chunlei Jing | Jingying Li | Liming Wang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Neural language models have achieved superior performance. However, these models also suffer from the pathology of overconfidence in the out-of-distribution examples, potentially making the model difficult to interpret and making the interpretation methods fail to provide faithful attributions. In this paper, we explain the model pathology from the view of sentence representation and argue that the counter-intuitive bias degree and direction of the out-of-distribution examples’ representation cause the pathology. We propose a Contrastive learning regularization method using Adversarial examples for Alleviating the Pathology (ConAAP), which calibrates the sentence representation of out-of-distribution examples. ConAAP generates positive and negative examples following the attribution results and utilizes adversarial examples to introduce direction information in regularization. Experiments show that ConAAP effectively alleviates the model pathology while slightly impacting the generalization ability on in-distribution examples and thus helps interpretation methods obtain more faithful results.

2022

pdf bib abs

This paper describes USTC-NELSLIP’s submissions to the IWSLT 2022 Offline Speech Translation task, including speech translation of talks from English to German, English to Chinese and English to Japanese. We describe both cascaded architectures and end-to-end models which can directly translate source speech into target text. In the cascaded condition, we investigate the effectiveness of different model architectures with robust training and achieve 2.72 BLEU improvements over last year’s optimal system on MuST-C English-German test set. In the end-to-end condition, we build models based on Transformer and Conformer architectures, achieving 2.26 BLEU improvements over last year’s optimal end-to-end system. The end-to-end system has obtained promising results, but it is still lagging behind our cascaded models.

pdf bib abs

Neural networks are vulnerable to adversarial examples. The adversary can successfully attack a model even without knowing model architecture and parameters, i.e., under a black-box scenario. Previous works on word-level attacks widely use word importance ranking (WIR) methods and complex search methods, including greedy search and heuristic algorithms, to find optimal substitutions. However, these methods fail to balance the attack success rate and the cost of attacks, such as the number of queries to the model and the time consumption. In this paper, We propose PAthological woRd Saliency sEarch (PARSE) that performs the search under dynamic search space following the subarea importance. Experiments show that PARSE can achieve comparable attack success rates to complex search methods while saving numerous queries and time, e.g., saving at most 74% of queries and 90% of time compared with greedy search when attacking the examples from Yelp dataset. The adversarial examples crafted by PARSE are also of high quality, highly transferable, and can effectively improve model robustness in adversarial training.

pdf bib abs

Few-shot named entity recognition (NER) enables us to build a NER system for a new domain using very few labeled examples. However, existing prototypical networks for this task suffer from roughly estimated label dependency and closely distributed prototypes, thus often causing misclassifications. To address the above issues, we propose EP-Net, an Entity-level Prototypical Network enhanced by dispersedly distributed prototypes. EP-Net builds entity-level prototypes and considers text spans to be candidate entities, so it no longer requires the label dependency. In addition, EP-Net trains the prototypes from scratch to distribute them dispersedly and aligns spans to prototypes in the embedding space using a space projection. Experimental results on two evaluation tasks and the Few-NERD settings demonstrate that EP-Net consistently outperforms the previous strong models in terms of overall performance. Extensive analyses further validate the effectiveness of EP-Net.