Ana-Maria Barbu


2008

pdf bib
Romanian Lexical Data Bases: Inflected and Syllabic Forms Dictionaries
Ana-Maria Barbu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents two lexical data bases for Romanian: RoMorphoDict, a dictionary of inflected forms and RoSyllabiDict, a dictionary of syllabified inflected forms. Each data basis is available in two Unicode formats: text and XML. An entry of RoMorphoDict, in text format, contains information on inflected form, its lemma, its morpho-syntactic description and the marking of the stressed vowel in pronunciation, while in XML format, an entry, representing the whole paradigm of a word, contains further informations about roots and paradigm class. An entry of RoSyllabiDict, in both formats, contains information about unsyllabified word, its syllabified correspondent, grammatical information and/or type of syllabification, if it is the case. The stressed vowel is also marked on the syllabified form. Each lexical data base includes the corresponding inflected forms of about 65,000 lemmas, that is, over 700,000 entries in RoMorphoDict, and over 500,000 entries in RoSyllabiDict. Both resources are available for free. The paper describes in detail the content of these data bases and the procedure of building them.

2006

pdf bib
Romanian Valence Dictionary in XML Format
Ana-Maria Barbu | Emil Ionescu | Verginica Barbu Mititelu
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Valence dictionaries are dictionaries in which logical predicates (most of the times verbs) are inventoried alongside with the semantic and syntactic information regarding the role of the arguments with which they combine, as well as the syntactic restrictions these arguments have to obey. In this article we present the incipient stage of the project “Syntactic and semantic database in XML format: an HPSG representation of verb valences in Romanian”. Its aim is the development of a valence dictionary in XML format for a set of 3000 Romanian verbs. Valences are specified for each sense of each verb, alongside with an illustrative example, possible argument alternations and a set of multiword expressions in which the respective verb occurs with the respective sense. The grammatical formalism we make use of is Head-driven Phrase Structure Grammar, which offers one of the most comprehensive frames of encoding various types of linguistic information for lexical items. XML is the most appropriate mark-up language for describing information structured in HPSG framework. The project can be further on extended so that to cover all Romanian verbs (around 7000) and also other predicates (nouns, adjectives, prepositions).

2004

pdf bib
A Word Alignment System Based on a Translation Equivalence Extractor
Ana-Maria Barbu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf bib
TREQ-AL: A word alignment system with limited language resources
Dan Tufiş | Ana-Maria Barbu | Radu Ion
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond

2002

pdf bib
Lexical token alignment: experiments, results and applications
Dan Tufiş | Ana-Maria Barbu
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)