2012
pdf
abs
Statistical Machine Translation without Source-side Parallel Corpus Using Word Lattice and Phrase Extension
Takanori Kusumoto
|
Tomoyosi Akiba
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Statistical machine translation (SMT) requires a parallel corpus between the source and target languages. Although a pivot-translation approach can be applied to a language pair that does not have a parallel corpus directly between them, it requires both source―pivot and pivot―target parallel corpora. We propose a novel approach to apply SMT to a resource-limited source language that has no parallel corpus but has only a word dictionary for the pivot language. The problems with dictionary-based translations lie in their ambiguity and incompleteness. The proposed method uses a word lattice representation of the pivot-language candidates and word lattice decoding to deal with the ambiguity; the lattice expansion is accomplished by using a pivot―target phrase translation table to compensate for the incompleteness. Our experimental evaluation showed that this approach is promising for applying SMT, even when a source-side parallel corpus is lacking.
pdf
abs
Designing an Evaluation Framework for Spoken Term Detection and Spoken Document Retrieval at the NTCIR-9 SpokenDoc Task
Tomoyosi Akiba
|
Hiromitsu Nishizaki
|
Kiyoaki Aikawa
|
Tatsuya Kawahara
|
Tomoko Matsui
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
We describe the evaluation framework for spoken document retrieval for the IR for the Spoken Documents Task, conducted in the ninth NTCIR Workshop. The two parts of this task were a spoken term detection (STD) subtask and an ad hoc spoken document retrieval subtask (SDR). Both subtasks target search terms, passages and documents included in academic and simulated lectures of the Corpus of Spontaneous Japanese. Seven teams participated in the STD subtask and five in the SDR subtask. The results obtained through the evaluation in the workshop are discussed.
2010
pdf
abs
Language Modeling Approach for Retrieving Passages in Lecture Audio Data
Koichiro Honda
|
Tomoyosi Akiba
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
Spoken Document Retrieval (SDR) is a promising technology for enhancing the utility of spoken materials. After the spoken documents have been transcribed by using a Large Vocabulary Continuous Speech Recognition (LVCSR) decoder, a text-based ad hoc retrieval method can be applied directly to the transcribed documents. However, recognition errors will significantly degrade the retrieval performance. To address this problem, we have previously proposed a method that aimed to fill the gap between automatically transcribed text and correctly transcribed text by using a statistical translation technique. In this paper, we extend the method by (1) using neighboring context to index the target passage, and (2) applying a language modeling approach for document retrieval. Our experimental evaluation shows that context information can improve retrieval performance, and that the language modeling approach is effective in incorporating context information into the proposed SDR method, which uses a translation model.
2008
pdf
Statistical Machine Translation based Passage Retrieval for Cross-Lingual Question Answering
Tomoyosi Akiba
|
Kei Shimizu
|
Atsushi Fujii
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II
pdf
abs
Test Collections for Spoken Document Retrieval from Lecture Audio Data
Tomoyosi Akiba
|
Kiyoaki Aikawa
|
Yoshiaki Itoh
|
Tatsuya Kawahara
|
Hiroaki Nanjo
|
Hiromitsu Nishizaki
|
Norihito Yasuda
|
Yoichi Yamashita
|
Katunobu Itou
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The Spoken Document Processing Working Group, which is part of the special interest group of spoken language processing of the Information Processing Society of Japan, is developing a test collection for evaluation of spoken document retrieval systems. A prototype of the test collection consists of a set of textual queries, relevant segment lists, and transcriptions by an automatic speech recognition system, allowing retrieval from the Corpus of Spontaneous Japanese (CSJ). From about 100 initial queries, application of the criteria that a query should have more than five relevant segments that consist of about one minute speech segments yielded 39 queries. Targeting the test collection, an ad hoc retrieval experiment was also conducted to assess the baseline retrieval performance by applying a standard method for spoken document retrieval.
2006
pdf
abs
Exploiting Dynamic Passage Retrieval for Spoken Question Recognition and Context Processing towards Speech-driven Information Access Dialogue
Tomoyosi Akiba
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Speech interfaces and dialogue processing abilities have promise for improving the utility of open-domain question answering (QA).We propose a novel method of resolving disambiguation problems arisen in those speech and dialogue enhanced QA tasks. The proposed method exploits passage retrieval, which is one of main components common in many QA systems. The basic idea of the method is that the similarity with some passage in the target documents can be used to select the appropriate question from the candidates. In this paper, we applied the method to solve two subtasks of QA, which are (1) N-best rescoring of LVCSR outputs, which selects a most appropriate candidate as a question sentence, in speech-driven QA (SDQA) task and (2) context processing, which compose a complete question sentence from a submitted incomplete one by using the elements appeared in the dialogue context, in information access dialogue (IAD) task. For both tasks, a dynamic passage retrieval is introduced to further improve the performance. The experimental results showed that the proposed method is quite effective in order to improve the performance of QA in both two tasks.
2004
pdf
Collecting Spontaneously Spoken Queries for Information Retrieval
Tomoyosi Akiba
|
Atsushi Fujii
|
Katunobu Itou
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
1994
pdf
A Bayesian Approach for User Modeling in Dialogue Systems
Tomoyosi Akiba
|
Hozumi Tanaka
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics