Leveraging Generative AI for Extracting Business Requirements from Legacy COBOL and PL/I Code

Ankur Kalohia


Abstract
Recovering business requirements fromCOBOL and PL/I portfolios is difficult becauselogic is scattered across interdependentprograms and data definitions, and existinganalyses seldom yield stakeholder-facingartifacts. We introduce an LLM-augmentedreverse-engineering pipeline that providesdeterministic parsing, schema-constrainedLLM generation with bidirectional traceabilityto code. It couples grammar-based parsingand control-flow and data-flow analysis with alarge language model to translate an enrichedintermediate representation into structuredspecifications. This is not raw-code promptingor generic summarization, the novelty is theLLM-centered generation over an enriched IR,with structured JSON outputs and traceabilityfor compliance-sensitive settings. The pipelineproduces business requirements documents,explicit rule catalogs, end-to-end data lineage,create–read–update–delete matrices, and field-level source-to-target mappings, each linkedto the supporting code. In a financial industrysetting, containing 3.4M+ LoC includingcomments / 3.2M excluding comments ofCOBOL, the system achieves 93% agreementwith expert-authored business rules andreduces documentation effort by approximately70%, as measured against manually producedrequirement documents and rule sets. On theinternal corpus spanning 3.4M lines acrossonline, batch, and job control workloads, theapproach yields approximately 3.2–3.3× fasteranalysis while improving artifact consistencyand traceability.
Anthology ID:
2026.acl-industry.4
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, USA
Editors:
Yunyao Li, Georg Rehm, Mei Tu
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
43–55
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.4/
DOI:
Bibkey:
Cite (ACL):
Ankur Kalohia. 2026. Leveraging Generative AI for Extracting Business Requirements from Legacy COBOL and PL/I Code. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 43–55, San Diego, California, USA. Association for Computational Linguistics.
Cite (Informal):
Leveraging Generative AI for Extracting Business Requirements from Legacy COBOL and PL/I Code (Kalohia, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-industry.4.pdf