Makoto Nagao

Also published as: M. Nagao


2005

1999

1998

1997

In this paper is described a general framework of a next generation machine translation system which translates a text not sentence by sentence but by considering inter-sentential discourse. The method is a step closer to human translation than the present-day machine translation systems. Particularly important are a detailed discourse analysis and a flexible text generation by using information obtained from the discourse analysis.

1996

1995

We describe and evaluate experimentally a method to parse a tagged corpus without grammar modeling a natural language on context-free language. This method is based on the following three hypotheses. 1) Part-of-speech sequences on the right-hand side of a rewriting rule are less constrained as to what part-of-speech precedes and follows them than non-constituent sequences. 2) Part-of-speech sequences directly derived from the same non-terminal symbol have similar environments. 3) The most suitable set of rewriting rules makes the greatest reduction of the corpus size. Based on these hypotheses, the system finds a set of constituent-like part-of-speech sequences and replaces them with a new symbol. The repetition of these processes brings us a set of rewriting rules, a grammar, and the bracketed corpus.

1994

1993

A case structure expression is one of the most important forms to represent the meaning of a sentence. Case structure analysis is usually performed by consulting case frame information in verb dictionaries and by selecting a proper case frame for an input sentence. However, this analysis is very difficult because of word sense ambiguity and structural ambiguity. A conventional method for solving these problems is to use the method of selectional restriction, but this method has a drawback in the semantic marker (SM) system – the trade-off between descriptive power and construction cost. This paper describes a method of case structure analysis of Japanese sentences which overcomes the drawback in the SM system, concentrating on the structural disambiguation. This method selects a proper case frame for an input by the similarity measure between the input and typical example sentences of each case frame. When there are two or more possible readings for an input because of structural ambiguity, the best reading will be selected by evaluating case structures in each possible reading by the similarity measure with typical example sentences of case frames.

1992

1991

February 13-25, 1991

1990

1989

1988

1987

1986

1985

1984

A new software system for describing a grammar of a machine translation system has been developed. This software system is called GRADE (GRAmmar DEscriber). GRADE has the following features: 1. GRADE allows a grammar writer to divide a whole grammar into several parts. Each part of the grammar is called a subgrammar. A subgrammar describes a step of the translation process. A whole grammar is then described by a network of sub-grammars. This network is called a subgrammar network. A subgrammar network allows a grammar writer to control the process of the translation precisely. When a subgrammar network in the analysis phase consists of a subgrammar for a noun-phrase (SG1) and a subgrammar for a verb-phase (SG2) in this sequence, the subgrammar network first applies SG1 to an input sentence, then applies SG2 to the result of an application of SG1, thus getting a syntactic structure for the input sentence. 2. A subgrammar consists of a set of rewriting rules. Rewriting rules in a subgrammar are applied for an input sentence in an appropriate order, which is specified in the description of the subgrammar. A rewriting rule transforms a tree structure into another tree structure. Rewriting rules use a powerful pattern matching algorithm to test their applicability to a tree structure. For example, a grammar writer can write a pattern that recognizes and parses an arbitrary numbers of sub-trees. Each node of a tree-structure has a list of pairs of a property name and a property value. A node can express a category name, a semantic marker, flags to control the translation process, and various other information. This tree-to-tree transformation operation by GRADE allows a grammar writer to describe all the processes of analysis, transfer and generation of a machine translation system with this uniform description capability of GRADE. 3. A subgrammar network or a subgrammar can be written in an entry of the dictionaries for a machine translation system. A subgrammar network or a subgrammar written in a dictionary entry is called a dictionary rule, which is specific for a word. When an input sentence contains a word which has a dictionary rule, it is applied to an input sentence at an appropriate point of a translation process. It can express more precise processing appropriate for that specific word that a general Subgrammar Network or Subgrammar. it also allows grammar writers to adjust a machine translation system to a specific domain easily. 4. GRADE is written in LISP. GRADE is implemented on FACOM M-382 and Symbolics 3600. GRADE is used in the machine translation system between Japanese and English. The project was started by the Japanese government in 1982. The effectiveness of GRADE has been demonstrated in the project.

1982

1980

1976

1965