Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus

Kanako Komiya; Hiroyuki Shinnou

doi:10.18653/v1/W18-3408

Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus

Abstract

Fine-tuning is a popular method to achieve better performance when only a small target corpus is available. However, it requires tuning of a number of metaparameters and thus it might carry risk of adverse effect when inappropriate metaparameters are used. Therefore, we investigate effective parameters for fine-tuning when only a small target corpus is available. In the current study, we target at improving Japanese word embeddings created from a huge corpus. First, we demonstrate that even the word embeddings created from the huge corpus are affected by domain shift. After that, we investigate effective parameters for fine-tuning of the word embeddings using a small target corpus. We used perplexity of a language model obtained from a Long Short-Term Memory network to assess the word embeddings input into the network. The experiments revealed that fine-tuning sometimes give adverse effect when only a small target corpus is used and batch size is the most important parameter for fine-tuning. In addition, we confirmed that effect of fine-tuning is higher when size of a target corpus was larger.

Anthology ID:: W18-3408
Volume:: Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP
Month:: July
Year:: 2018
Address:: Melbourne
Editors:: Reza Haffari, Colin Cherry, George Foster, Shahram Khadivi, Bahar Salehi
Venue:: ACL
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 60–67
Language:
URL:: https://preview.aclanthology.org/jlcl-multiple-ingestion/W18-3408/
DOI:: 10.18653/v1/W18-3408
Bibkey:
Cite (ACL):: Kanako Komiya and Hiroyuki Shinnou. 2018. Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus. In Proceedings of the Workshop on Deep Learning Approaches for Low-Resource NLP, pages 60–67, Melbourne. Association for Computational Linguistics.
Cite (Informal):: Investigating Effective Parameters for Fine-tuning of Word Embeddings Using Only a Small Corpus (Komiya & Shinnou, ACL 2018)
Copy Citation:
PDF:: https://preview.aclanthology.org/jlcl-multiple-ingestion/W18-3408.pdf

PDF Cite Search Fix data