Lukas Amadeus Kleybolte


2025

pdf bib
Instruction-tuned QwenChart for Chart Question Answering
Viviana Ventura | Lukas Amadeus Kleybolte | Alessandra Zarcone
Proceedings of the Fifth Workshop on Scholarly Document Processing (SDP 2025)

Charts, where information is delivered holis-tically by visual and textual features, repre-sent a challenge when it comes to downstreamtasks such as chart question answering, whereboth kinds of information contribute to the task.The standard approach is to decouple the taskin two steps, first extracting information fromthe charts, or representing it as a table, textor code, and then a second reasoning step tooutput the answers. Today, the advancementsin visual encoding of Visual Large LanguageModels (VLLM) have shown their capabilitiesto solve such complex tasks without using in-between representations of the charts or mas-sive in-domain training. Our new instructionfine-tuned and chain-of-thought model Qwen-Chart showed that even in a complex newbenchmark such as SciVQA general modelscan achieve great performances with low-costtraining, matching the capabilities that LLMshave showed in unimodal downstream tasks.An out-of-domain evaluation showed satisfac-tory results, albeit with an expected drop inperformance.