Sayoko Kaide
2010
A Person-Name Filter for Automatic Compilation of Bilingual Person-Name Lexicons
Satoshi Sato
|
Sayoko Kaide
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
This paper proposes a simple and fast person-name filter, which plays an important role in automatic compilation of a large bilingual person-name lexicon. This filter is based on pn_score, which is the sum of two component scores, the score of the first name and that of the last name. Each score is calculated from two term sets: one is a dense set in which most of the members are person names; another is a baseline set that contains less person names. The pn_score takes one of five values, {+2, +1, 0, -1, -2}, which correspond to strong positive, positive, undecidable, negative, and strong negative, respectively. This pn_score can be easily extended to bilingual pn_score that takes one of nine values, by summing scores of two languages. Experimental results show that our method works well for monolingual person names in English and Japanese; the F-score of each language is 0.929 and 0.939, respectively. The performance of the bilingual person-name filter is better; the F-score is 0.955.