Constructivist Tokenization for English

Allison Fan; Weiwei Sun

Constructivist Tokenization for English

Abstract

This paper revisits tokenization from a theoretical perspective, and argues for the necessity of a constructivist approach to tokenization for semantic parsing and modeling language acquisition. We consider two problems: (1) (semi-) automatically converting existing lexicalist annotations, e.g. those of the Penn TreeBank, into constructivist annotations, and (2) automatic tokenization of raw texts. We demonstrate that (1) a heuristic rule-based constructivist tokenizer is able to yield relatively satisfactory accuracy when gold standard Penn TreeBank part-of-speech tags are available, but that some manual annotations are still necessary to obtain gold standard results, and (2) a neural tokenizer is able to provide accurate automatic constructivist tokenization results from raw character sequences. Our research output also includes a set of high-quality morpheme-tokenized corpora, which enable the training of computational models that more closely align with language comprehension and acquisition.

Anthology ID:: 2023.cxgsnlp-1.5
Volume:: Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)
Month:: March
Year:: 2023
Address:: Washington, D.C.
Editors:: Claire Bonial, Harish Tayyar Madabushi
Venues:: CxGsNLP | SyntaxFest
SIG:: SIGPARSE
Publisher:: Association for Computational Linguistics
Note:
Pages:: 36–40
Language:
URL:: https://aclanthology.org/2023.cxgsnlp-1.5
DOI:
Bibkey:
Cite (ACL):: Allison Fan and Weiwei Sun. 2023. Constructivist Tokenization for English. In Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023), pages 36–40, Washington, D.C.. Association for Computational Linguistics.
Cite (Informal):: Constructivist Tokenization for English (Fan & Sun, CxGsNLP-SyntaxFest 2023)
Copy Citation:
PDF:: https://preview.aclanthology.org/emnlp-22-attachments/2023.cxgsnlp-1.5.pdf

PDF Search