Radosław Moszczyński
2008
Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research
Piotr Bański
|
Radosław Moszczyński
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper describes a project aimed at converting a legacy representation of English idioms into an XML-based format. The project is set in the context of a large electronic English-Polish dictionary which contains several hundred formalized idiom descriptions and which has been released under the terms of a free license. In short, the project consists of three phases: cleaning up the dictionary markup, extracting the legacy idiom representations, and converting them into TEI P5 XML constrained by a RelaxNG grammar created for this purpose and constituting a module that can be included as part of the TEI P5 schema. The paper contains general descriptions of the individual phases and several examples of XML-encoded idioms. It also suggests some directions for further research, which include abstracting the XML-ized idiom representations into general syntactic patterns and using the representations to automatically identify idioms in tagged corpora.
2007
A Practical Classification of Multiword Expressions
Radosław Moszczyński
Proceedings of the ACL 2007 Student Research Workshop
Search