Building a Database of Japanese Adjective Examples from Special Purpose Web Corpora

Masaya Yamaguchi


Abstract
It is often difficult to collect many examples for low-frequency words from a single general purpose corpus. In this paper, I present a method of building a database of Japanese adjective examples from special purpose Web corpora (SPW corpora) and investigates the characteristics of examples in the database by comparison with examples that are collected from a general purpose Web corpus (GPW corpus). My proposed method construct a SPW corpus for each adjective considering to collect examples that have the following features: (i) non-bias, (ii) the distribution of examples extracted from every SPW corpus bears much similarity to that of examples extracted from a GPW corpus. The results of experiments shows the following: (i) my proposed method can collect many examples rapidly. The number of examples extracted from SPW corpora is more than 8.0 times (median value) greater than that from the GPW corpus. (ii) the distributions of co-occurrence words for adjectives in the database are similar to those taken from the GPW corpus.
Anthology ID:
L14-1058
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
3684–3688
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1075_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Masaya Yamaguchi. 2014. Building a Database of Japanese Adjective Examples from Special Purpose Web Corpora. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3684–3688, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Building a Database of Japanese Adjective Examples from Special Purpose Web Corpora (Yamaguchi, LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/1075_Paper.pdf