Abstract
Processing tabular data holds significant importance across various domains and applications. This study investigates the performance and limitations of fine-tuned models for tabular data analysis, specifically focusing on using fine-tuning mechanics on an English model towards a potential German model. The validation of the effectiveness of the transfer learning approach compares the performance of the fine-tuned German model and of the original English model on test data from the German training set. A potential shortcut that translates the German test data into English serves for comparison. Results reveal that the fine-tuned model outperforms the original model significantly, demonstrating the effectiveness of transfer learning even for a limited amount of training data. One also observes that the English model can effectively process translated German tabular data, albeit with a slight accuracy drop compared to fine-tuning. The model evaluation extends to real-world data extracted from the sustainability reports of a financial institution. The fine-tuned model proves superior in extracting knowledge from these training-unrelated tables, indicating its potential applicability in practical scenarios. This paper also releases the first manually annotated dataset for German Table Question Answering and the related annotation tool.- Anthology ID:
- 2024.lrec-main.1354
- Volume:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
- Venues:
- LREC | COLING
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 15579–15584
- Language:
- URL:
- https://aclanthology.org/2024.lrec-main.1354
- DOI:
- Cite (ACL):
- Dominik Andreas Kowieski, Michael Hellwig, and Thomas Feilhauer. 2024. TAPASGO: Transfer Learning towards a German-Language Tabular Question Answering Model. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 15579–15584, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- TAPASGO: Transfer Learning towards a German-Language Tabular Question Answering Model (Kowieski et al., LREC-COLING 2024)
- PDF:
- https://preview.aclanthology.org/landing_page/2024.lrec-main.1354.pdf