Afonso Sousa
2025
PAM: Paraphrase AMR-Centric Evaluation Metric
Afonso Sousa
|
Henrique Lopes Cardoso
Findings of the Association for Computational Linguistics: ACL 2025
Paraphrasing is rooted in semantics, which makes evaluating paraphrase generation systems hard. Current paraphrase generators are typically evaluated using borrowed metrics from adjacent text-to-text tasks, like machine translation or text summarization. These metrics tend to have ties to the surface form of the reference text. This is not ideal for paraphrases as we typically want variation in the lexicon while persisting semantics. To address this problem, and inspired by learned similarity evaluation on plain text, we propose PAM, a Paraphrase AMR-Centric Evaluation Metric. This metric uses AMR graphs extracted from the input text, which consist of semantic structures agnostic to the text surface form, making the resulting evaluation metric more robust to variations in syntax or lexicon. Additionally, we evaluated PAM on different semantic textual similarity datasets and found that it improves the correlations with human semantic scores when compared to other AMR-based metrics.
2024
PTPARL-V: Portuguese Parliamentary Debates for Voting Behaviour Study
Afonso Sousa
|
Henrique Lopes Cardoso
Proceedings of the IV Workshop on Creating, Analysing, and Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024
We present a new dataset, , that provides valuable insight for advancing discourse analysis of parliamentary debates in Portuguese. This is achieved by processing the open-access information available at the official Portuguese Parliament website and scraping the information from the debate minutes’ PDFs contained therein. Our dataset includes interventions from 547 different deputies of all major Portuguese parties, from 736 legislative initiatives spanning five legislatures from 2005 to 2021. We present a statistical analysis of the dataset compared to other publicly available Portuguese parliamentary debate corpora. Finally, we provide baseline performance analysis for voting behaviour classification.