AMTA/SIG-IL First Workshop on Interlinguas
We present a machine translation framework in which the interlingua— Lexical Conceptual Structure (LCS)—is coupled with a definitional component that includes bilingual (EuroWordNet) links between words in the source and target languages. While the links between individual words are language-specific, the LCS is designed to be a language-independent, compositional representation. We take the view that the two types of information—shallower, transfer-like knowledge as well as deeper, compositional knowledge—can be reconciled in interlingual machine translation, the former for overcoming the intractability of LCS-based lexical selec- tion, and the latter for relating the underlying semantics of two words cross-linguistically. We describe the acquisition process for these two information types and present results of hand-verification of the acquired lexicon. Finally, we demonstrate the utility of the two information types in interlingual MT.
We describe a theoretical investigation into the semantic space described by our interlingua (IL), which currently has 191 main verb classes divided into 434 subclasses, represented by 237 distinct Lexical Conceptual Structures (LCSs). Using the model of aspect in Olsen (1994; 1997)—monotonic aspectual composition—we have identified 71 aspectually basic subclasses that are associated with one or more of 68 aspectually non-basic classes via some lexical (“type-shifting”) rule (Bresnan, 1982; Pinker, 1984; Levin and Rappaport Hovav, 1995). This allows us to refine the IL and address certain computational and theoretical issues at the same time. (1) From a linguistic viewpoint, the expected benefits include a refinement of the aspectual model in (Olsen, 1994; Olsen, 1997) (which provides necessary but not sufficient conditions for aspectual com- position), and a refinement of the verb classifications in (Levin, 1993); we also expect our approach to eventually produce a systematic definition (in terms of LCSs and compositional operations) of the precise meaning components responsible for Levin's classification. (2) Computationally, the lexicon is made more compact.
This paper describes characteristics of an interlingua we have developed. It contains a large lexicon and has been tested on actual MT systems in the translation of large volumes of actual documents. The main characteristics of the interlingua are as follows: (1) Conceptual primitives, elements of the interlingua, can be linked to any parts of speech in English or Japanese. (2) Positions of the top node on the interlingua correspond to differences in syntactic structures. (3) Two or more conceptual graphs can be used for expressing the same concept, and can be converted to another by conceptual transformation rules which are independent of any specific language. (4) Conceptual primitives are divided into two classes; (a) functional conceptual primitives, which are finite and manageable and constitute, along with rules for interpreting conceptual graphs, the grammar of the interlingua, and (b) general conceptual primitives, which correspond to specific words in actual languages and which, depending on the direction of translation, may or may not be used. Our commercial MT products using the interlingua produce results of roughly the same or higher quality than systems using the syntactic transfer method, which fact indicates the feasibility of the interlingua approach.
This paper describes the outline of the EDR Concept Dictionary and gives some examples of interlingual representations as the semantic representations for an input sentence.
In this paper we report on experiments using WordNet synset tags to evaluate the semantic properties of the verb classes cataloged by Levin (1993). This paper represents ongoing research begun at the University of Pennsylvania (Rosenzweig and Dang, 1997; Palmer, Rosenzweig, and Dang, 1997) and the University of Maryland (Dorr and Jones, 1996b; Dorr and Jones, 1996a; Dorr and Jones, 1996c). Using WordNet sense tags to constrain the intersection of Levin classes, we avoid spurious class intersections introduced by homonymy and polysemy (run a bath, run a mile). By adding class intersections based on a single shared sense-tagged word, we minimize the impact of the non-exhaustiveness of Levin’s database (Dorr and Olsen, 1996; Dorr, To appear). By examining the syntactic properties of the intersective classes, we provide a clearer picture of the relationship between WordNet/EuroWordNet and the LCS interlingua for machine translation and other NLP applications.