Extraction of Contrastive Rules from Syntactic Treebanks: A Case Study in Romance Languages
Santiago Herrera, Ioana-Madalina Silai, Bruno Guillaume, Sylvain Kahane
Abstract
In this paper, we develop a data-driven contrastive framework to extract common and distinctive linguistic descriptions from syntactic treebanks. The extracted contrastive rules are defined by a statistically significant difference in precision and classified as common and distinctive rules across the set of treebanks. We illustrate our method by working on object word order using Universal Dependencies (UD) treebanks in 6 Romance languages: Brazilian Portuguese, Catalan, French, Italian, Romanian and Spanish. We discuss the limitations faced due to inconsistent annotation and the feasibility of conducting contrasting studies using the UD collection.- Anthology ID:
- 2025.quasy-1.5
- Volume:
- Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025)
- Month:
- August
- Year:
- 2025
- Address:
- Ljubljana, Slovenia
- Editors:
- Xinying Chen, Yaqin Wang
- Venues:
- Quasy | WS | SyntaxFest
- SIG:
- SIGPARSE
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 26–38
- Language:
- URL:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.quasy-1.5/
- DOI:
- Cite (ACL):
- Santiago Herrera, Ioana-Madalina Silai, Bruno Guillaume, and Sylvain Kahane. 2025. Extraction of Contrastive Rules from Syntactic Treebanks: A Case Study in Romance Languages. In Proceedings of the Third Workshop on Quantitative Syntax (QUASY, SyntaxFest 2025), pages 26–38, Ljubljana, Slovenia. Association for Computational Linguistics.
- Cite (Informal):
- Extraction of Contrastive Rules from Syntactic Treebanks: A Case Study in Romance Languages (Herrera et al., Quasy-SyntaxFest 2025)
- PDF:
- https://preview.aclanthology.org/mtsummit-25-ingestion/2025.quasy-1.5.pdf