This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
LarsHellan
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
In this paper, we describe NoVRol, a semantic role lexicon of Norwegian verbs. We start from the NorVal valency lexicon, which describes the syntactic frames of 7.400 verbs. We then enrich each of these frames by annotating, based on the VerbNet annotation scheme, each argument of the verb with the semantic role that it gets. We also encode the syntactic roles of the arguments based on the UD annotation scheme. Our resource will faciliate future research on Norwegian verbs, and can at a future stage be expanded to a full VerbNet
The paper presents an annotation schema with the following characteristics: it is formally compact; it systematically and compositionally expands into fullfledged analytic representations, exploiting simple algorithms of typed feature structures; its representation of various dimensions of semantic content is systematically integrated with morpho-syntactic and lexical representation; it is integrated with a ‘deep’ parsing grammar. Its compactness allows for efficient handling of large amounts of structures and data, and it is interoperable in covering multiple aspects of grammar and meaning. The code and its analytic expansions represent a cross-linguistically wide range of phenomena of languages and language structures. This paper presents its syntactic-semantic interoperability first from a theoretical point of view and then as applied in linguistic description.
Verb valence information can be derived from corpora by using subcorpora of typical sentences that are constructed in a language independent manner based on frequent POS structures. The inspection of typical sentences with a fixed verb in a certain position can show the valence information directly. Using verb fingerprints, consisting of the most typical sentence patterns the verb appears in, we are able to identify standard valence patterns and compare them against a language’s valence profile. With a very limited number of training data per language, valence information for other verbs can be derived as well. Based on the Norwegian valence patterns we are able to find comparative patterns in German where typical sentences are able to express the same situation in an equivalent way and can so construct verb valence pairs for a bilingual PolyVal dictionary. This contribution discusses this application with a focus on the Norwegian valence dictionary NorVal.
The paper describes aspects of an HPSG style computational grammar of the West African language Ga (a Kwa language spoken in the Accra area of Ghana). As a Volta Basin Kwa language, Ga features many types of multiverb expressions and other particular constructional patterns in the verbal and nominal domain. The paper highlights theoretical and formal features of the grammar motivated by these phenomena, some of them possibly innovative to the formal framework. As a so-called deep grammar of the language, it hosts a rich lexical structure, and we describe ways in which the grammar builds on previously available lexical resources. We outline an environment of current resources in which the grammar is part, and lines of research and development in which it and its environment can be used.
Traditionally, a lexicographer identifies the lexical items to be added to a dictionary. Here we present a corpus-based approach to dictionary compilation and describe a procedure that derives a Twi dictionary from a TypeCraft corpus of Interlinear Glossed Texts. We first extracted a list of unique words. We excluded words belonging to different dialects of Akan (mostly Fante and Abron). We corrected misspellings and distinguished English loan words to be integrated in our dictionary from instances of code switching. Next to the dictionary itself, one other resource arising from our work is a lexicographical model for Akan which represents the lexical resource itself, and the extended morphological and word class inventories that provide information to be aggregated. We also represent external resources such as the corpus that serves as the source and word level audio files. The Twi dictionary consists at present of 1367 words; it will be available online and from an open mobile app.
MultiVal is a valence lexicon derived from lexicons of computational HPSG grammars for Norwegian, Spanish and Ga (ISO 639-3, gaa), with altogether about 22,000 verb entries and on average more than 200 valence types defined for each language. These lexical resources are mapped onto a common set of discriminants with a common array of values, and stored in a relational database linked to a web demo and a wiki presentation. Search discriminants are syntactic argument structure (SAS), functional specification, situation type and aspect, for any subset of languages, as well as the verb type systems of the grammars. Search results are lexical entries satisfying the discriminants entered, exposing the specifications from the respective provenance grammars. The Ga grammar lexicon has in turn been converted from a Ga Toolbox lexicon. Aside from the creation of such a multilingual valence resource through converging or converting existing resources, the paper also addresses a tool for the creation of such a resource as part of corpus annotation for less resourced languages.
The paper presents advances in the use of semantic features and interlingua relations for word sense disambiguation (WSD) as part of unification-based deep processing grammars. Formally we present an extension of Minimal Recursion Semantics, introducing sortal specifications as well as linterlingua semantic relations as a means of semantic decomposition.