CodeClarity: A Framework and Benchmark for Evaluating Multilingual Code Summarization

Madhurima Chakraborty, Drishti Sharma, Maryam Sikander, Eman Nisar


Abstract
Large Language Models (LLMs) are increasingly used to summarize and document code, yet most research and training data remain limited to English. This creates barriers for developers working in other languages and leaves the multilingual capabilities of LLMs largely unexplored. We present CodeClarity, a framework for evaluating multilingual code summarization across six programming and six natural languages. It combines reference-based metrics, LLM-judge ratings, and faithfulness checks (identifiers and script) to capture surface similarity, semantic adequacy, and code-aware fidelity. Our experiments reveal that lexical metrics penalize morphologically rich languages, while judge-based evaluations provide more stable, semantically aligned assessments. This work establishes the first reproducible foundation for studying multilingual code summarization and points toward fairer, more inclusive evaluation of code intelligence systems. CodeClarity-Bench and the full evaluation pipeline are publicly available at huggingface.co/CodeClarity and github.com/MadhuNimmo/CodeClarity, enabling community-scale human validation and follow-up studies.
Anthology ID:
2026.lrec-main.511
Volume:
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Month:
May
Year:
2026
Address:
Palma de Mallorca, Spain
Editors:
Stelios Piperidis, Núria Bel, Henk van den Heuvel, Nancy Ide, Simon Krek, Antonio Toral
Venue:
LREC
SIG:
Publisher:
ELRA Language Resource Association
Note:
Pages:
6439–6451
Language:
URL:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.511/
DOI:
Bibkey:
Cite (ACL):
Madhurima Chakraborty, Drishti Sharma, Maryam Sikander, and Eman Nisar. 2026. CodeClarity: A Framework and Benchmark for Evaluating Multilingual Code Summarization. International Conference on Language Resources and Evaluation, main:6439–6451.
Cite (Informal):
CodeClarity: A Framework and Benchmark for Evaluating Multilingual Code Summarization (Chakraborty et al., LREC 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-lrec/2026.lrec-main.511.pdf