Language Resource Addition: Dictionary or Corpus?

Shinsuke Mori, Graham Neubig


Abstract
In this paper, we investigate the relative effect of two strategies of language resource additions to the word segmentation problem and part-of-speech tagging problem in Japanese. The first strategy is adding entries to the dictionary and the second is adding annotated sentences to the training corpus. The experimental results showed that the annotated sentence addition to the training corpus is better than the entries addition to the dictionary. And the annotated sentence addition is efficient especially when we add new words with contexts of three real occurrences as partially annotated sentences. According to this knowledge, we executed annotation on the invention disclosure texts and observed word segmentation accuracy.
Anthology ID:
L14-1515
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1631–1636
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/648_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Shinsuke Mori and Graham Neubig. 2014. Language Resource Addition: Dictionary or Corpus?. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 1631–1636, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Language Resource Addition: Dictionary or Corpus? (Mori & Neubig, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/648_Paper.pdf