Qing Ma

Extraction of Broad-Scale, High-Precision Japanese-English Parallel Translation Expressions Using Lexical Information and Rules
Qing Ma | Shinya Sakagami | Masaki Murata
Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation

2008

pdf bib

Non-Factoid Japanese Question Answering through Passage Retrieval that Is Weighted Based on Types of Answers
Masaki Murata | Sachiyo Tsukawaki | Toshiyuki Kanamaru | Qing Ma | Hitoshi Isahara
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-II

pdf bib abs

Word Alignment Annotation in a Japanese-Chinese Parallel Corpus
Yujie Zhang | Zhulong Wang | Kiyotaka Uchimoto | Qing Ma | Hitoshi Isahara
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Parallel corpora are critical resources for machine translation research and development since parallel corpora contain translation equivalences of various granularities. Manual annotation of word & phrase alignment is of significance to provide gold-standard for developing and evaluating both example-based machine translation model and statistical machine translation model. This paper presents the work of word & phrase alignment annotation in the NICT Japanese-Chinese parallel corpus, which is constructed at the National Institute of Information and Communications Technology (NICT). We describe the specification of word alignment annotation and the tools specially developed for the manual annotation. The manual annotation on 17,000 sentence pairs has been completed. We examined the manually annotated word alignment data and extracted translation knowledge from the word & phrase aligned corpus.

pdf bib abs

Selection of Japanese-English Equivalents by Integrating High-quality Corpora and Huge Amounts of Web Data
Qing Ma | Koichi Nakao | Masaki Murata | Hitoshi Isahara
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

As a first step to developing systems that enable non-native speakers to output near-perfect English sentences for given mixed English-Japanese sentences, we propose new approaches for selecting English equivalents by using the number of hits for various contexts in large English corpora. As the large English corpora, we not only used the huge amounts of Web data but also the manually compiled large, high-quality English corpora. Using high-quality corpora enables us to accurately select equivalents, and using huge amounts of Web data enables us to resolve the problem of the shortage of hits that normally occurs when using only high-quality corpora. The types and lengths of contexts used to select equivalents are variable and optimally determined according to the number of hits in the corpora, so that performance can be further refined. Computer experiments showed that the precision of our methods was much higher than that of the existing methods for equivalent selection.

2007

pdf bib

Building Japanese-Chinese translation dictionary based on EDR Japanese-English bilingual dictionary
Yujie Zhang | Qing Ma | Hitoshi Isahara
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib abs

Semantic Analysis of Abstract Nouns to Compile a Thesaurus of Adjectives
Kyoko Kanzaki | Qing Ma | Eiko Yamamoto | Hitoshi Isahara
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Aiming to compile a thesaurus of adjectives, we discuss how to extract abstract nouns categorizing adjectives, clarify the semantic and syntactic functions of these abstract nouns, and manually evaluate the capability to extract the instance-category relations. We focused on some Japanese syntactic structures and utilized possibility of omission of abstract noun to decide whether or not a semantic relation between an adjective and an abstract noun is an instance-category relation. For 63% of the adjectives (57 groups/90 groups) in our experiments, our extracted categories were found to be most suitable. For 22 % of the adjectives (20/90), the categories in the EDR lexicon were found to be most suitable. For 14% of the adjectives (13/90), neither our extracted categories nor those in EDR were found to be suitable, or examinees own categories were considered to be more suitable. From our experimental results, we found that the correspondence between a group of adjectives and their category name was more suitable in our method than in the EDR lexicon.

pdf bib

2005

pdf bib abs

Building an Annotated Japanese-Chinese Parallel Corpus – A Part of NICT Multilingual Corpora
Yujie Zhang | Kiyotaka Uchimoto | Qing Ma | Hitoshi Isahara
Proceedings of Machine Translation Summit X: Papers

We are constricting a Japanese-Chinese parallel corpus, which is a part of the NICT Multilingual Corpora. The corpus is general domain, of large scale of about 40,000 sentence pairs, long sentences, annotated with detailed information and high quality. To the best of our knowledge, this will be the first annotated Japanese-Chinese parallel corpus in the world. We created the corpus by selecting Japanese sentences from Mainichi Newspaper and then manually translating them into Chinese. We then annotated the corpus with morphological and syntactic structures and alignments at word and phrase levels. This paper describes the specification in human translation and detailed information annotation, and the tools we developed in the project. The experience we obtained and points we paid special attentions are also introduced for share with other researches in corpora construction.

pdf bib abs

A Multi-aligner for Japanese-Chinese Parallel Corpora
Yujie Zhang | Qun Liu | Qing Ma | Hitoshi Isahara
Proceedings of Machine Translation Summit X: Papers

Automatic word alignment is an important technology for extracting translation knowledge from parallel corpora. However, automatic techniques cannot resolve this problem completely because of variances in translations. We therefore need to investigate the performance potential of automatic word alignment and then decide how to suitably apply it. In this paper we first propose a lexical knowledge-based approach to word alignment on a Japanese-Chinese corpus. Then we evaluate the performance of the proposed approach on the corpus. At the same time we also apply a statistics-based approach, the well-known toolkit GIZA++, to the same test data. Through comparison of the performances of the two approaches, we propose a multi-aligner, exploiting the lexical knowledge-based aligner and the statistics-based aligner at the same time. Quantitative results confirmed the effectiveness of the multi-aligner.

pdf bib

Building an Annotated Japanese-Chinese Parallel Corpus - A Part of NICT Multilingual Corpora
Yujie Zhang | Kiyotaka Uchimoto | Qing Ma | Hitoshi Isahara
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

pdf bib

Information Retrieval Capable of Visualization and High Precision
Qing Ma | Kousuke Enomoto | Masaki Murata | Hitoshi Isahara
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

pdf bib

Trend Survey on Japanese Natural Language Processing Studies over the Last Decade
Masaki Murata | Koji Ichii | Qing Ma | Tamotsu Shirado | Toshiyuki Kanamaru | Hitoshi Isahara
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts

pdf bib

Analysis of Machine Translation Systems’ Errors in Tense, Aspect, and Modality
Masaki Murata | Kiyotaka Uchimoto | Qing Ma | Toshiyuki Kanamaru | Hitoshi Isahara
Proceedings of the 19th Pacific Asia Conference on Language, Information and Computation