Distinguishing Japanese Non-standard Usages from Standard Ones

Tatsuya Aoki, Ryohei Sasano, Hiroya Takamura, Manabu Okumura


Abstract
We focus on non-standard usages of common words on social media. In the context of social media, words sometimes have other usages that are totally different from their original. In this study, we attempt to distinguish non-standard usages on social media from standard ones in an unsupervised manner. Our basic idea is that non-standardness can be measured by the inconsistency between the expected meaning of the target word and the given context. For this purpose, we use context embeddings derived from word embeddings. Our experimental results show that the model leveraging the context embedding outperforms other methods and provide us with findings, for example, on how to construct context embeddings and which corpus to use.
Anthology ID:
D17-1246
Volume:
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing
Month:
September
Year:
2017
Address:
Copenhagen, Denmark
Editors:
Martha Palmer, Rebecca Hwa, Sebastian Riedel
Venue:
EMNLP
SIG:
SIGDAT
Publisher:
Association for Computational Linguistics
Note:
Pages:
2323–2328
Language:
URL:
https://aclanthology.org/D17-1246
DOI:
10.18653/v1/D17-1246
Bibkey:
Cite (ACL):
Tatsuya Aoki, Ryohei Sasano, Hiroya Takamura, and Manabu Okumura. 2017. Distinguishing Japanese Non-standard Usages from Standard Ones. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2323–2328, Copenhagen, Denmark. Association for Computational Linguistics.
Cite (Informal):
Distinguishing Japanese Non-standard Usages from Standard Ones (Aoki et al., EMNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/fix-dup-bibkey/D17-1246.pdf