Abstract
Sentence acceptability judgment assesses to what degree a sentence is acceptable to native speakers of the language. Most unsupervised prediction approaches rely on a language model to obtain the likelihood of a sentence that reflects acceptability. However, two problems exist: first, low-frequency words would have a significant negative impact on the sentence likelihood derived from the language model; second, when it comes to multiple domains, the language model needs to be trained on domain-specific text for domain adaptation. To address both problems, we propose a simple method that substitutes Part-of-Speech (POS) tags for low-frequency words in sentences used for continual training of masked language models. Experimental results show that our word-tag-hybrid BERT model brings improvement on both a sentence acceptability benchmark and a cross-domain sentence acceptability evaluation corpus. Furthermore, our annotated cross-domain sentence acceptability evaluation corpus would benefit future research.- Anthology ID:
- 2022.aacl-short.25
- Volume:
- Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)
- Month:
- November
- Year:
- 2022
- Address:
- Online only
- Editors:
- Yulan He, Heng Ji, Sujian Li, Yang Liu, Chua-Hui Chang
- Venues:
- AACL | IJCNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 194–201
- Language:
- URL:
- https://aclanthology.org/2022.aacl-short.25
- DOI:
- Cite (ACL):
- Yang Zhao and Issei Yoshida. 2022. A Simple Yet Effective Hybrid Pre-trained Language Model for Unsupervised Sentence Acceptability Prediction. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 194–201, Online only. Association for Computational Linguistics.
- Cite (Informal):
- A Simple Yet Effective Hybrid Pre-trained Language Model for Unsupervised Sentence Acceptability Prediction (Zhao & Yoshida, AACL-IJCNLP 2022)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2022.aacl-short.25.pdf