Shinichiro Miyazawa

2005

pdf abs
Quality Analysis of Patent Parallel Corpus by the Scale
Isamu Okada | Shinichiro Miyazawa | Kazunari Ishida | Nobuhiko Shimizu | Toshizumi Ohta
Workshop on patent translation

Large-scale parallel corpus is extremely important for translation memory, example-based machine translation, and the support system to create English sentences. Organized collection or establishment of large-scale corpus is currently ongoing; however it is a difficult project in terms of copyrights as well as economic efficiency. To investigate general tendency of large-scale corpus helps to improve economical efficiency of parallel corpus collection as well as system establishment. In this study, therefore, the relationship between the scale of parallel corpus and the degree of correspondence is clarified, using parallel corpus for patents.

2001

Evaluation of machine translation is one of the most important issues in this field. We have already proposed a quantitative evaluation of machine translation system. The method was roughly that an example sentence in Japanese is machine translated into English, and then into Japanese using several systems, and that the comparison of output Japanese sentences with the original Japanese sentence is done for the word identification, the correctness of the modification, the syntactic dependency, and the parataxis. By calculating the score, we could quantitatively evaluate the English machine translation. However, the extraction of word identification etc. was done by human, and the fact affects the correctness of evaluation. In order to solve this problem, we developed an automatic evaluation system. We report the detail of the system in this paper..

1999

Compared with off-line machine translation (MT). MT for the WWW has more evaluation factors such as translation accuracy of text, interpretation of HTML tags, consistency with various protocols and browsers, and translation speed for net surfing. Moreover, the speed of technical innovation and its practical application is fast, including the appearance of new protocols. Improvement of MT software for the WWW will enable the sharing of information from around the world and make a great deal of contribution to mankind. Despite the importance of general evaluation studies on MT software for the WWW. it appears that such studies have not yet been conducted. Since MT for the WWW will be a critical factor for future international communication, its study and evaluation is an important theme. This study aims at standardized evaluation of MT for the WWW. and suggests an evaluation method focusing on unique aspects of the WWW independent of text. This evaluation method has a wide range of aptitude without depending on specific languages. Twenty-four items specific to the WWW were actually evaluated with regard to six MT software for the WWW. This study clarified various issues which should be improved in the future regarding MT software for the WWW and issues on evaluation technology of MT on the Internet.

One of the most important issues in the field of machine translation is evaluation of the translated sentences. This paper proposes a quantitative method of evaluation for machine translation systems. The method is as follows. First, an example sentence in Japanese is machine translated into English using several Japanese-English machine translation systems. Second, the output English sentences are machine translated into Japanese using several English-Japanese machine translation systems (different from the Japanese-English machine translation systems). Then, each output Japanese sentence is compared with the original Japanese sentence in terms of word identification, correctness of the modification, syntactic dependency, and parataxes. An average score is calculated, and this becomes the total evaluation of the machine translation of the sentence. From this two-way machine translation and the calculation of the score, we can quantitatively evaluate the English machine translation. For the present study, we selected 100 Japanese sentences from the abstracts of scientific articles. Each of these sentences has an English translation which was performed by a human. Approximately half of these sentences are evaluated and the results are given. In addition, a comparison of human and machine translations is also performed and the trade-off between the two methods of translation is discussed.

Co-authors

Yoshiko Shirokizawa 3

Venues

mtsummit4