Archaeology at WE-2026 PARSEME 2.0 Subtask 1 and 2: Parsing is for Encoders, Paraphrasing is for LLMs

Rares-Alexandru Roscan, Sergiu Nisioi


Abstract
This paper presents our approach to the PARSEME 2.0 Shared Task on Romanian, covering both Identification (Subtask 1) and Paraphrasing (Subtask 2). While Large Language Models (LLMs) excel at semantic generation, we hypothesize that they lack the structural precision required for MWE identification, leading to "boundary hallucinations" that compromise downstream simplification. Our Rank 1 results on Romanian confirm this: a specialized encoder (RoBERT) using standard sequence labeling outperforms both few-shot LLMs and complex structural parsers (MTLB-STRUCT). This justifies our proposed pipeline: using encoders as precise “pointers” to guide the generative power of LLMs.
Anthology ID:
2026.mwe-1.31
Volume:
Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026)
Month:
March
Year:
2026
Address:
Rabat, Marocco
Editors:
Atul Kr. Ojha, Verginica Barbu Mititelu, Mathieu Constant, Ivelina Stoyanova, A. Seza Doğruöz, Alexandre Rademaker
Venues:
MWE | WS
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
237–247
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.31/
DOI:
Bibkey:
Cite (ACL):
Rares-Alexandru Roscan and Sergiu Nisioi. 2026. Archaeology at WE-2026 PARSEME 2.0 Subtask 1 and 2: Parsing is for Encoders, Paraphrasing is for LLMs. In Proceedings of the 22nd Workshop on Multiword Expressions (MWE 2026), pages 237–247, Rabat, Marocco. Association for Computational Linguistics.
Cite (Informal):
Archaeology at WE-2026 PARSEME 2.0 Subtask 1 and 2: Parsing is for Encoders, Paraphrasing is for LLMs (Roscan & Nisioi, MWE 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.mwe-1.31.pdf