Jason Zhang

2025

pdf bib abs
Decoding Actionability: A Computational Analysis of Teacher Observation Feedback
Mayank Sharma | Jason Zhang
Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)

This study presents a computational analysis to classify actionability in teacher feedback. We fine-tuned a RoBERTa model on 662 manually annotated feedback examples from West African classrooms, achieving strong classification performance (accuracy = 0.94, precision = 0.90, recall = 0.96, f1 = 0.93). This enabled classification of over 12,000 feedback instances. A comparison of linguistic features indicated that actionable feedback was associated with lower word count but higher readability, greater lexical diversity, and more modifier usage. These findings suggest that concise, accessible language with precise descriptive terms may be more actionable for teachers. Our results support focusing on clarity in teacher observation protocols while demonstrating the potential of computational approaches in analyzing educational feedback at scale.

LLM jailbreaks are a widespread safety challenge. Given this problem has not yet been tractable, we suggest targeting a key failure mechanism: the failure of safety to generalize across semantically equivalent inputs. We further focus the target by requiring desirable tractability properties of attacks to study: explainability, transferability between models, and transferability between goals. We perform red-teaming within this framework by uncovering new vulnerabilities to multi-turn, multi-image, and translation-based attacks. These attacks are semantically equivalent by our design to their single-turn, single-image, or untranslated counterparts, enabling systematic comparisons; we show that the different structures yield different safety outcomes. We then demonstrate the potential for this framework to enable new defenses by proposing a Structure Rewriting Guardrail, which converts an input to a structure more conducive to safety assessment. This guardrail significantly improves refusal of harmful inputs, without over-refusing benign ones. Thus, by framing this intermediate challenge—more tractable than universal defenses but essential for long-term safety—we highlight a critical milestone for AI safety research.

2022

This paper mainly describes the dma submission to the TempoWiC task, which achieves a macro-F1 score of 77.05% and attains the first place in this task. We first explore the impact of different pre-trained language models. Then we adopt data cleaning, data augmentation, and adversarial training strategies to enhance the model generalization and robustness. For further improvement, we integrate POS information and word semantic representation using a Mixture-of-Experts (MoE) approach. The experimental results show that MoE can overcome the feature overuse issue and combine the context, POS, and word semantic features well. Additionally, we use a model ensemble method for the final prediction, which has been proven effective by many research works.

Jason Zhang

2025

2022

2002

Co-authors

Venues