Varun Chandrasekaran
2026
SACTOR: LLM-Driven Correct and Idiomatic C to Rust Translation with Static Analysis and FFI-Based Verification
Tianyang Zhou | Ziyi Zhang | Haowen Lin | Somesh Jha | Mihai Christodorescu | Kirill Levchenko | Varun Chandrasekaran
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tianyang Zhou | Ziyi Zhang | Haowen Lin | Somesh Jha | Mihai Christodorescu | Kirill Levchenko | Varun Chandrasekaran
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Translating software written in C to Rust has significant benefits in improving memory safety. However, manual translation is cumbersome, error-prone, and often produces unidiomatic code. Large language models (LLMs) have demonstrated promise in producing idiomatic translations, but offer no correctness guarantees. We propose SACTOR, an LLM-driven C-to-Rust translation tool that employs a two-step process: an initial "unidiomatic" translation to preserve semantics, followed by an "idiomatic" refinement to align with Rust standards. SACTOR leverages static analysis of the C source to handle pointer semantics and dependency resolution. To validate correctness of our function-wise incremental translation that mixes C and Rust, we use end-to-end testing via the foreign function interface. We evaluate SACTOR on 200 programs from two public datasets and on two more complex scenarios (a 50-sample subset of CRust-Bench and the libogg library), comparing multiple LLMs. Across datasets, SACTOR delivers high end-to-end correctness and produces safe, idiomatic Rust with up to 7× fewer Clippy warnings; On CRust-Bench, SACTOR achieves an average (across samples) of 85% unidiomatic and 52% idiomatic success, and on libogg it attains full unidiomatic and up to 78% idiomatic coverage on GPT-5.
2025
Attention Speaks Volumes: Localizing and Mitigating Bias in Language Models
Rishabh Adiga | Besmira Nushi | Varun Chandrasekaran
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Rishabh Adiga | Besmira Nushi | Varun Chandrasekaran
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
We believe that analyzing attention is crucial for understanding bias in large language models (LLMs); in ambiguous comparative prompting frameworks, it provides insight into how the LLM distributes its focus across different entities, and how this contributes to biased decisions. To this end, we first introduce a metric to quantify the “entity preference” of an LLM. We then propose ATLAS, a technique to localize bias to specific layers of the LLM by analyzing attention scores and then reduce bias by scaling attention in these biased layers. To evaluate our method, we conduct extensive experiments across 3 datasets, 4 models, and 4 baseline approaches. Our experiments demonstrate that bias is concentrated in the later layers, typically around the last third. We also show how ATLAS effectively mitigates bias through targeted interventions without compromising downstream performance and an average increase of only 0.34% in perplexity when the intervention is applied. We see an average improvement of 0.28 points in the bias score across all the datasets.
2024
Designing Informative Metrics for Few-Shot Example Selection
Rishabh Adiga | Lakshmi Subramanian | Varun Chandrasekaran
Findings of the Association for Computational Linguistics: ACL 2024
Rishabh Adiga | Lakshmi Subramanian | Varun Chandrasekaran
Findings of the Association for Computational Linguistics: ACL 2024
Pretrained language models (PLMs) have shown remarkable few-shot learning capabilities when provided with properly formatted examples. However, selecting the “best” examples remains an open challenge. We propose a complexity-based prompt selection approach for sequence tagging tasks. This approach avoids the training of a dedicated model for selection of examples, and instead uses certain metrics to align the syntactico-semantic complexity of test sentences and examples. We use both sentence- and word-level metrics to match the complexity of examples to the (test) sentence being considered. Our results demonstrate that our approach extracts greater performance from PLMs: it achieves state-of-the-art performance on few-shot NER, achieving a 5% absolute improvement in F1 score on the CoNLL2003 dataset for GPT-4. We also see large gains of upto 28.85 points (F1/Acc.) in smaller models like GPT-j-6B.
Bypassing LLM Watermarks with Color-Aware Substitutions
Qilong Wu | Varun Chandrasekaran
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Qilong Wu | Varun Chandrasekaran
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Watermarking approaches are proposed to identify if text being circulated is human- or large language model- (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (“green”) tokens. However, determining the robustness of this watermarking method under finite (low) edit budgets is an open problem. Additionally, existing attack methods failto evade detection for longer text segments. We overcome these limitations, and propose Self Color Testing-based Substitution (SCTS), thefirst “color-aware” attack. SCTS obtains color information by strategically prompting the watermarked LLM and comparing output tokensfrequencies. It uses this information to determine token colors, and substitutes green tokens with non-green ones. In our experiments, SCTS successfully evades watermark detection using fewer number of edits than related work. Additionally, we show both theoretically and empirically that SCTS can remove the watermark for arbitrarily long watermarked text.