Measuring Visual Understanding in Telecom domain: Performance Metrics for Image-to-UML conversion using VLMs

H. G. Ranjani, Rutuja Prabhudesai


Abstract
Telecom domain 3GPP documents are replete with images containing sequence diagrams. Advances in Vision-Language Large Models (VLMs) have eased conversion of such images to machine-readable PlantUML (puml) formats. However, there is a gap in evaluation of such conversions - existing works do not compare puml scripts for various components. In this work, we propose performance metrics to measure the effectiveness of such conversions. A subset of sequence diagrams from 3GPP documents is chosen to be representative of domain-specific actual scenarios. We compare puml outputs from two VLMs - Claude Sonnet and GPT-4V - against manually created ground truth representations. We use version control tools to capture differences and introduce standard performance metrics to measure accuracies along various components: participant identification, message flow accuracy, sequence ordering, and grouping construct preservation. We demonstrate effectiveness of proposed metrics in quantifying conversion errors across various components. The results show that nodes, edges and messages are accurately captured. However, we observe that VLMs do not necessarily perform well on complex structures such as notes, box, groups. Our experiments and performance metrics indicates a need for better representation of these components in training data for fine-tuned VLMs.
Anthology ID:
2025.eval4nlp-1.2
Volume:
Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems
Month:
December
Year:
2025
Address:
Mumbai, India
Editors:
Mousumi Akter, Tahiya Chowdhury, Steffen Eger, Christoph Leiter, Juri Opitz, Erion Çano
Venues:
Eval4NLP | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
9–20
Language:
URL:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.eval4nlp-1.2/
DOI:
Bibkey:
Cite (ACL):
H. G. Ranjani and Rutuja Prabhudesai. 2025. Measuring Visual Understanding in Telecom domain: Performance Metrics for Image-to-UML conversion using VLMs. In Proceedings of the 5th Workshop on Evaluation and Comparison of NLP Systems, pages 9–20, Mumbai, India. Association for Computational Linguistics.
Cite (Informal):
Measuring Visual Understanding in Telecom domain: Performance Metrics for Image-to-UML conversion using VLMs (Ranjani & Prabhudesai, Eval4NLP 2025)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-ijcnlp-aacl/2025.eval4nlp-1.2.pdf