Marah Ghoummaid

2025

pdf bib abs
MATCH: Task-Driven Code Evaluation through Contrastive Learning
Marah Ghoummaid | Vladimir Tchuiev | Ofek Glick | Michal Moshkovitz | Dotan Di Castro
Findings of the Association for Computational Linguistics: EMNLP 2025

AI-based code generation is increasingly prevalent, with GitHub Copilot estimated to generate 46% of the code on GitHub. Accurately evaluating how well generated code aligns with developer intent remains a critical challenge. Traditional evaluation methods, such as unit tests, are often unscalable and costly. Syntactic similarity metrics (e.g., BLEU, ROUGE) fail to capture code functionality, and metrics like CodeBERTScore require reference code, which is not always available. To address the gap in reference-free evaluation, with few alternatives such as ICE-Score, this paper introduces MATCH, a novel reference-free metric. MATCH uses Contrastive Learning to generate meaningful embeddings for code and natural language task descriptions, enabling similarity scoring that reflects how well generated code implements the task. We show that MATCH achieves stronger correlations with functional correctness and human preference than existing metrics across multiple programming languages.

Co-authors

Venues

findings1

Fix data

Marah Ghoummaid

Fixing paper assignments

2025

Co-authors

Venues