A Computational Method for Measuring Open Codes in Qualitative Analysis

John Chen; Alexandros Nikolaos Lotsos; Sihan Cheng; Lexie Zhao; Yanjia Zhang; Jessica Hullman; Bruce Sherin; Uri Wilensky; Michael Horn

A Computational Method for Measuring Open Codes in Qualitative Analysis

John Chen, Alexandros Nikolaos Lotsos, Sihan Cheng, Lexie Zhao, Yanjia Zhang, Jessica Hullman, Bruce Sherin, Uri Wilensky, Michael Horn

Abstract

Qualitative analysis is critical to understanding human datasets in many social science disciplines. A central method in this process is inductive coding, where researchers identify and interpret codes directly from the datasets themselves. Yet, this exploratory approach poses challenges for meeting methodological expectations (such as "depth" and "variation"), especially as researchers increasingly adopt Generative AI (GAI) for support. Ground-truth-based metrics are insufficient because they contradict the exploratory nature of inductive coding; cluster- or topic-level metrics fail to capture the interpretive, cross-cutting nature of qualitative codes; and manual evaluation can be labor-intensive. This paper presents a theory-informed computational method for measuring inductive coding results from humans and GAI. Our method first merges individual codebooks into an Aggregated Code Space using an LLM-enriched hierarchical clustering algorithm. It then measures each coder’s contribution against the merged result using four novel metrics: Coverage, Overlap, Novelty, and Divergence, designed to capture breadth, consensus, unique contribution, and systematic deviation without assuming ground truth. Through two experiments on a human-coded online conversation dataset, we 1) reveal the merging algorithm’s impact on metrics; 2) validate the metrics’ stability and robustness across multiple runs and different LLMs; and 3) showcase the metrics’ ability to diagnose coding issues, such as excessive or irrelevant (hallucinated) codes. We discuss how these metrics should be interpreted in combination and their current limitations. Our work provides a reliable pathway for ensuring methodological rigor in human-AI qualitative analysis.

Anthology ID:: 2026.findings-acl.2073
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41740–41758
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2073/
DOI:
Bibkey:
Cite (ACL):: John Chen, Alexandros Nikolaos Lotsos, Sihan Cheng, Lexie Zhao, Yanjia Zhang, Jessica Hullman, Bruce Sherin, Uri Wilensky, and Michael Horn. 2026. A Computational Method for Measuring Open Codes in Qualitative Analysis. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41740–41758, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: A Computational Method for Measuring Open Codes in Qualitative Analysis (Chen et al., Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2073.pdf
Checklist:: 2026.findings-acl.2073.checklist.pdf

PDF Cite Search Checklist Fix data