The Role of Adverbs in Language Variety Identification: The Case of Portuguese Multi-Word Adverbs

Izabela Müller, Nuno Mamede, Jorge Baptista


Abstract
This paper aims to assess the role of multiword compound adverbs in distinguishing Brazilian Portuguese (PT-BR) from European Portuguese (PT-PT). Two key factors underpin this focus: Firstly, multiword expressions often provide less ambiguity compared to single words, even when their meaning is idiomatic (non-compositional). Secondly, despite constituting a significant portion of lexicons in many languages, they are frequently overlooked in Natural Language Processing, possibly due to their heterogeneous nature and lexical range.For this study, a large lexicon of Portuguese multiword adverbs (3,665) annotated with diatopic information regarding language variety was utilized. The paper investigates the distribution of this category in a corpus consisting in excerpts from journalistic texts sourced from the DSL (Dialect and Similar Language) corpus, representing Brazilian (PT-BR) and European Portuguese (PT-PT), respectively, each partition containing 18,000 sentences.Results indicate a substantial similarity between the two varieties, with a considerable overlap in the lexicon of multiword adverbs. Additionally, specific adverbs unique to each language variety were identified. Lexical entries recognized in the corpus represent 18.2% (PT-BR) to 19.5% (PT-PT) of the lexicon, and approximately 5,700 matches in each partition. While many of the matches are spurious due to ambiguity with otherwise non-idiomatic, free strings, occurrences of adverbs marked as exclusive to one variety in texts from the other variety are rare.
Anthology ID:
2024.vardial-1.8
Volume:
Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024)
Month:
June
Year:
2024
Address:
Mexico City, Mexico
Editors:
Yves Scherrer, Tommi Jauhiainen, Nikola Ljubešić, Marcos Zampieri, Preslav Nakov, Jörg Tiedemann
Venues:
VarDial | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
99–106
Language:
URL:
https://aclanthology.org/2024.vardial-1.8
DOI:
10.18653/v1/2024.vardial-1.8
Bibkey:
Cite (ACL):
Izabela Müller, Nuno Mamede, and Jorge Baptista. 2024. The Role of Adverbs in Language Variety Identification: The Case of Portuguese Multi-Word Adverbs. In Proceedings of the Eleventh Workshop on NLP for Similar Languages, Varieties, and Dialects (VarDial 2024), pages 99–106, Mexico City, Mexico. Association for Computational Linguistics.
Cite (Informal):
The Role of Adverbs in Language Variety Identification: The Case of Portuguese Multi-Word Adverbs (Müller et al., VarDial-WS 2024)
Copy Citation:
PDF:
https://preview.aclanthology.org/nschneid-patch-4/2024.vardial-1.8.pdf
Supplementary material:
 2024.vardial-1.8.SupplementaryMaterial.txt