Tianhao Niu
2025
Chart2Code53: A Large-Scale Diverse and Complex Dataset for Enhancing Chart-to-Code Generation
Tianhao Niu
|
Yiming Cui
|
Baoxin Wang
|
Xiao Xu
|
Xin Yao
|
Qingfu Zhu
|
Dayong Wu
|
Shijin Wang
|
Wanxiang Che
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Chart2code has recently received significant attention in the multimodal community due to its potential to reduce the burden of visualization and promote a more detailed understanding of charts. However, existing Chart2code-related training datasets suffer from at least one of the following issues: (1) limited scale, (2) limited type coverage, and (3) inadequate complexity. To address these challenges, we seek more diverse sources that better align with real-world user distributions and propose dual data synthesis pipelines: (1) synthesize based on online plotting code. (2) synthesize based on chart images in the academic paper. We create a large-scale Chart2code training dataset Chart2code53, including 53 chart types, 130K Chart-code pairs based on the pipeline. Experimental results demonstrate that even with few parameters, the model finetuned on Chart2code53 achieves state-of-the-art performance on multiple Chart2code benchmarks within open-source models.
Search
Fix author
Co-authors
- Wanxiang Che (车万翔) 1
- Yiming Cui 1
- Baoxin Wang 1
- Shijin Wang 1
- Dayong Wu 1
- show all...