A Self-Referring Quantitative Evaluation of the ATR Basic Travel Expression Corpus (BTEC)

Kyo Kageura, Genichiro Kikui


Abstract
In this paper we evaluate the Basic Travel Expression Corpus (BTEC), developed by ATR (Advanced Telecommunication Research Laboratory), Japan. BTEC was specifically developed as a wide-coverage, consistent corpus containing basic Japanese travel expressions with English counterparts, for the purpose of providing basic data for the development of high quality speech translation systems. To evaluate the corpus, we introduce a quantitative method for evaluating the sufficiency of qualitatively well-defined corpora, on the basis of LNRE methods that can estimate the potential growth patterns of various sparse data by fitting various skewed distributions such as the Zipfian group of distributions, lognormal distribution, and inverse Gauss-Poisson distribution to them. The analyses show the coverage of lexical items of BTEC vis-a-vis the possible targets implicitly defined by the corpus itself, and thus provides basic insights into strategies for enhancing BTEC in future.
Anthology ID:
L06-1025
Volume:
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Month:
May
Year:
2006
Address:
Genoa, Italy
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/55_pdf.pdf
DOI:
Bibkey:
Cite (ACL):
Kyo Kageura and Genichiro Kikui. 2006. A Self-Referring Quantitative Evaluation of the ATR Basic Travel Expression Corpus (BTEC). In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
Cite (Informal):
A Self-Referring Quantitative Evaluation of the ATR Basic Travel Expression Corpus (BTEC) (Kageura & Kikui, LREC 2006)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2006/pdf/55_pdf.pdf