Corpus-based extraction and identification of Portuguese Multiword Expressions
Sandra Antunes, Maria Fernanda Bacelar do Nascimento, João Miguel Casteleiro, Amália Mendes, Luísa Pereira, Tiago Sá
Abstract
This presentation reports on an on-going project aimed at building a large lexical database of corpus-extracted multiword (MW) expressions for the Portuguese language. MW expressions were automatically extracted from a balanced 50 million word corpus compiled for this project, furthermore these were statistically interpreted using lexical association measures, followed by a manual validation process. The lexical database covers different types of MW expressions, from named entities to lexical associations with different degrees of cohesion, ranging from totally frozen idioms to favoured co-occurring forms, such as collocations. We aim to achieve two main objectives with this resource. Firstly to build on the large set of data of different types of MW expressions, thus revising existing typologies of collocations and integrating them in a larger theory of MW units. Secondly, to use the extensive hand-checked data as training data to evaluate existing statistical lexical association measures.- Anthology ID:
- 2006.jeptalnrecital-poster.2
- Volume:
- Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters
- Month:
- April
- Year:
- 2006
- Address:
- Leuven, Belgique
- Venue:
- JEP/TALN/RECITAL
- SIG:
- Publisher:
- ATALA
- Note:
- Pages:
- 389–397
- Language:
- URL:
- https://aclanthology.org/2006.jeptalnrecital-poster.2
- DOI:
- Cite (ACL):
- Sandra Antunes, Maria Fernanda Bacelar do Nascimento, João Miguel Casteleiro, Amália Mendes, Luísa Pereira, and Tiago Sá. 2006. Corpus-based extraction and identification of Portuguese Multiword Expressions. In Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Posters, pages 389–397, Leuven, Belgique. ATALA.
- Cite (Informal):
- Corpus-based extraction and identification of Portuguese Multiword Expressions (Antunes et al., JEP/TALN/RECITAL 2006)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/2006.jeptalnrecital-poster.2.pdf