Yasunori Abe
2008
Extraction of Informative Expressions from Domain-specific Documents
Eiko Yamamoto
|
Hitoshi Isahara
|
Akira Terada
|
Yasunori Abe
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
What kinds of lexical resources are helpful for extracting useful information from domain-specific documents? Although domain-specific documents contain much useful knowledge, it is not obvious how to extract such knowledge efficiently from the documents. We need to develop techniques for extracting hidden information from such domain-specific documents. These techniques do not necessarily use state-of-the-art technologies and achieve deep and accurate language understanding, but are based on huge amounts of linguistic resources, such as domain-specific lexical databases. In this paper, we introduce two techniques for extracting informative expressions from documents: the extraction of related words that are not only taxonomically related but also thematically related, and the acquisition of salient terms and phrases. With these techniques we then attempt to automatically and statistically extract domain-specific informative expressions in aviation documents as an example and evaluate the results.
Application of Resource-based Machine Translation to Real Business Scenes
Hitoshi Isahara
|
Masao Utiyama
|
Eiko Yamamoto
|
Akira Terada
|
Yasunori Abe
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
As huge quantities of documents have become available, services using natural language processing technologies trained by huge corpora have emerged, such as information retrieval and information extraction. In this paper we verify the usefulness of resource-based, or corpus-based, translation in the aviation domain as a real business situation. This study is important from both a business perspective and an academic perspective. Intuitively, manuals for similar products, or manuals for different versions of the same product, are likely to resemble each other. Therefore, even with only a small training data, a corpus-based MT system can output useful translations. The corpus-based approach is powerful when the target is repetitive. Manuals for similar products, or manuals for different versions of the same product, are real-world documents that are repetitive. Our experiments on translation of manual documents are still in a beginning stage. However, the BLEU score from very small number of training sentences is already rather high. We believe corpus-based machine translation is a player full of promise in this kind of actual business scene.