Abstract
The SPEADO model for sentence segmentation and punctuation tasks in ancient Chinese texts is proposed, which incorporates text chunking and MinHash indexing techniques to realise example argumentation. Additionally, decoding optimization strategies are introduced to direct the attention of the LLM model towards punctuation errors and address the issue of uncontrollable output. Experimental results show that the F1 score of the proposed method exceeds the baseline model by 14.18%, indicating a significant improvement in performance.- Anthology ID:
- 2024.lt4hala-1.32
- Volume:
- Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024
- Month:
- May
- Year:
- 2024
- Address:
- Torino, Italia
- Editors:
- Rachele Sprugnoli, Marco Passarotti
- Venues:
- LT4HALA | WS
- SIG:
- Publisher:
- ELRA and ICCL
- Note:
- Pages:
- 256–260
- Language:
- URL:
- https://aclanthology.org/2024.lt4hala-1.32
- DOI:
- Cite (ACL):
- Tian Xia, Kai Yu, Qianrong Yu, and Xinran Peng. 2024. SPEADO: Segmentation and Punctuation for Ancient Chinese Texts via Example Augmentation and Decoding Optimization. In Proceedings of the Third Workshop on Language Technologies for Historical and Ancient Languages (LT4HALA) @ LREC-COLING-2024, pages 256–260, Torino, Italia. ELRA and ICCL.
- Cite (Informal):
- SPEADO: Segmentation and Punctuation for Ancient Chinese Texts via Example Augmentation and Decoding Optimization (Xia et al., LT4HALA-WS 2024)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-2/2024.lt4hala-1.32.pdf