Sara Mendes

The growing amount of available information and the importance given to the access to technical information enhance the potential role of NLP applications in enabling users to deal with information for a variety of knowledge domains. In this process, language resources are crucial. This paper presents Lextec, a rich computational language resource for technical vocabulary in Portuguese. Encoding a representative set of terms for ten different technical domains, this concept-based relational language resource combines a wide range of linguistic information by integrating each entry in a domain-specific wordnet and associating it with a precise definition for each lexicalization in the technical domain at stake, illustrative texts and information for translation into English.

The work detailed in this paper describes a 2-step cascade approach for the classification of complex-type nominals. We describe an experiment that demonstrates how a cascade approach performs when the task consists in distinguishing nominals from a given complex-type from any other noun in the language. Overall, our classifier successfully identifies very specific and not highly frequent lexical items such as complex-types with high accuracy, and distinguishes them from those instances that are not complex types by using lexico-syntactic patterns indicative of the semantic classes corresponding to each of the individual sense components of the complex type. Although there is still room for improvement with regard to the coverage of the classifiers developed, the cascade approach increases the precision of classification of the complex-type nouns that are covered in the experiment presented.

This work addresses the classification of word pairs as instances of lexical-semantic relations. The classification is approached by leveraging patterns of co-occurrence contexts from corpus data. The significance of using dependency information, of augmenting the set of dependency paths provided to the system, and of generalizing patterns using part-of-speech information for the classification of lexical-semantic relation instances is analyzed. Results show that dependency information is decisive to achieve better results both in precision and recall, while generalizing features based on dependency information by replacing lexical forms with their part-of-speech increases the coverage of classification systems. Our experiments also make apparent that approaches based on the context where word pairs co-occur are upper-bound-limited by the times these appear in the same sentence. Therefore strategies to use information across sentence boundaries are necessary.

Sara Mendes

Fixing paper assignments

2015

2014

2013

2012

2011

2006

Co-authors

Venues