SPEADO: Segmentation and Punctuation for Ancient Chinese Texts via Example Augmentation and Decoding Optimization

Tian Xia, Kai Yu, Qianrong Yu, Xinran Peng


Abstract
The SPEADO model for sentence segmentation and punctuation tasks in ancient Chinese texts is proposed, which incorporates text chunking and MinHash indexing techniques to realise example argumentation. Additionally, decoding optimization strategies are introduced to direct the attention of the LLM model towards punctuation errors and address the issue of uncontrollable output. Experimental results show that the F1 score of the proposed method exceeds the baseline model by 14.18%, indicating a significant improvement in performance.
Anthology ID:
2024.lt4hala-1.32
Volume:
Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
Month:
May
Year:
2024
Address:
Torino, Italia
Editors:
Rachele Sprugnoli, Marco Passarotti
Venues:
LT4HALA | WS
SIG:
Publisher:
ELRA and ICCL
Note:
Pages:
256–260
Language:
URL:
https://aclanthology.org/2024.lt4hala-1.32
DOI:
Bibkey:
Cite (ACL):
Tian Xia, Kai Yu, Qianrong Yu, and Xinran Peng. 2024. SPEADO: Segmentation and Punctuation for Ancient Chinese Texts via Example Augmentation and Decoding Optimization. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 256–260, Torino, Italia. ELRA and ICCL.
Cite (Informal):
SPEADO: Segmentation and Punctuation for Ancient Chinese Texts via Example Augmentation and Decoding Optimization (Xia et al., LT4HALA-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-2/2024.lt4hala-1.32.pdf