Abstract
We train a diachronic long short-term memory (LSTM) part-of-speech tagger on a large corpus of American English from the 19th, 20th, and 21st centuries. We analyze the tagger’s ability to implicitly learn temporal structure between years, and the extent to which this knowledge can be transferred to date new sentences. The learned year embeddings show a strong linear correlation between their first principal component and time. We show that temporal information encoded in the model can be used to predict novel sentences’ years of composition relatively well. Comparisons to a feedforward baseline suggest that the temporal change learned by the LSTM is syntactic rather than purely lexical. Thus, our results suggest that our tagger is implicitly learning to model syntactic change in American English over the course of the 19th, 20th, and early 21st centuries.- Anthology ID:
- W19-4721
- Volume:
- Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change
- Month:
- August
- Year:
- 2019
- Address:
- Florence, Italy
- Editors:
- Nina Tahmasebi, Lars Borin, Adam Jatowt, Yang Xu
- Venue:
- LChange
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 167–174
- Language:
- URL:
- https://aclanthology.org/W19-4721
- DOI:
- 10.18653/v1/W19-4721
- Cite (ACL):
- William Merrill, Gigi Stark, and Robert Frank. 2019. Detecting Syntactic Change Using a Neural Part-of-Speech Tagger. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, pages 167–174, Florence, Italy. Association for Computational Linguistics.
- Cite (Informal):
- Detecting Syntactic Change Using a Neural Part-of-Speech Tagger (Merrill et al., LChange 2019)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/W19-4721.pdf
- Code
- viking-sudo-rm/DiachronicPOSTagger