Grouping Words with Semantic Diversity

Karine Chubarian, Abdul Rafae Khan, Anastasios Sidiropoulos, Jia Xu


Abstract
Deep Learning-based NLP systems can be sensitive to unseen tokens and hard to learn with high-dimensional inputs, which critically hinder learning generalization. We introduce an approach by grouping input words based on their semantic diversity to simplify input language representation with low ambiguity. Since the semantically diverse words reside in different contexts, we are able to substitute words with their groups and still distinguish word meanings relying on their contexts. We design several algorithms that compute diverse groupings based on random sampling, geometric distances, and entropy maximization, and we prove formal guarantees for the entropy-based algorithms. Experimental results show that our methods generalize NLP models and demonstrate enhanced accuracy on POS tagging and LM tasks and significant improvements on medium-scale machine translation tasks, up to +6.5 BLEU points. Our source code is available at https://github.com/abdulrafae/dg.
Anthology ID:
2021.naacl-main.257
Volume:
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Month:
June
Year:
2021
Address:
Online
Editors:
Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, Yichao Zhou
Venue:
NAACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3217–3228
Language:
URL:
https://preview.aclanthology.org/build-pipeline-with-new-library/2021.naacl-main.257/
DOI:
10.18653/v1/2021.naacl-main.257
Bibkey:
Cite (ACL):
Karine Chubarian, Abdul Rafae Khan, Anastasios Sidiropoulos, and Jia Xu. 2021. Grouping Words with Semantic Diversity. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 3217–3228, Online. Association for Computational Linguistics.
Cite (Informal):
Grouping Words with Semantic Diversity (Chubarian et al., NAACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/build-pipeline-with-new-library/2021.naacl-main.257.pdf
Optionalsupplementarydata:
 2021.naacl-main.257.OptionalSupplementaryData.zip
Optionalsupplementarycode:
 2021.naacl-main.257.OptionalSupplementaryCode.pdf
Video:
 https://preview.aclanthology.org/build-pipeline-with-new-library/2021.naacl-main.257.mp4
Code
 abdulrafae/dg