Robust to Noise Models in Natural Language Processing Tasks

Valentin Malykh


Abstract
There are a lot of noise texts surrounding a person in modern life. The traditional approach is to use spelling correction, yet the existing solutions are far from perfect. We propose robust to noise word embeddings model, which outperforms existing commonly used models, like fasttext and word2vec in different tasks. In addition, we investigate the noise robustness of current models in different natural language processing tasks. We propose extensions for modern models in three downstream tasks, i.e. text classification, named entity recognition and aspect extraction, which shows improvement in noise robustness over existing solutions.
Anthology ID:
P19-2002
Volume:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
Month:
July
Year:
2019
Address:
Florence, Italy
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
10–16
Language:
URL:
https://aclanthology.org/P19-2002
DOI:
10.18653/v1/P19-2002
Bibkey:
Cite (ACL):
Valentin Malykh. 2019. Robust to Noise Models in Natural Language Processing Tasks. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 10–16, Florence, Italy. Association for Computational Linguistics.
Cite (Informal):
Robust to Noise Models in Natural Language Processing Tasks (Malykh, ACL 2019)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/P19-2002.pdf
Code
 madrugado/robust-w2v +  additional community code