Abstract
The paper introduces a “train once, use many” approach for the syntactic analysis of phrasal compounds (PC) of the type XP+N like “Would you like to sit on my knee?” nonsense. PCs are a challenge for NLP tools since they require the identification of a syntactic phrase within a morphological complex. We propose a method which uses a state-of-the-art dependency parser not only to analyse sentences (the environment of PCs) but also to compound the non-head of PCs in a well-defined particular condition which is the analysis of the non-head spanning from the left boundary (mostly marked by a determiner) to the nominal head of the PC. This method contains the following steps: (a) the use an English state-of-the-art dependency parser with data comprising sentences with PCs from the British National Corpus (BNC), (b) the detection of parsing errors of PCs, (c) the separate treatment of the non-head structure using the same model, and (d) the attachment of the non-head to the compound head. The evaluation of the method showed that the accuracy of 76% could be improved by adding a step in the PC compounder module which specified user-defined contexts being sensitive to the part of speech of the non-head parts and by using TreeTagger, in line with our approach.- Anthology ID:
- L16-1174
- Volume:
- Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
- Month:
- May
- Year:
- 2016
- Address:
- Portorož, Slovenia
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 1092–1097
- Language:
- URL:
- https://preview.aclanthology.org/remove-affiliations/L16-1174/
- DOI:
- Cite (ACL):
- Carola Trips. 2016. Syntactic Analysis of Phrasal Compounds in Corpora: a Challenge for NLP Tools. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1092–1097, Portorož, Slovenia. European Language Resources Association (ELRA).
- Cite (Informal):
- Syntactic Analysis of Phrasal Compounds in Corpora: a Challenge for NLP Tools (Trips, LREC 2016)
- PDF:
- https://preview.aclanthology.org/remove-affiliations/L16-1174.pdf