Does Locality Cost in Polish Medical Text Classification? Duplicate-Aware Evaluation of Federated Learning

Daniel Cieślak, Andrzej Czyżewski


Abstract
Federated learning is often framed as a practical trade-off in clinical NLP: safer data handling at the cost of lower predictive performance. We revisit this assumption in a benchmark-specific study of Polish medical text classification. A key issue is evaluation granularity: the test split contains 10,634 rows but only 670 unique normalized text hashes, with 18 inconsistent groups removed in strict grouped evaluation. We therefore compare centralized and federated training under both conventional instance-level scoring and a stricter hash-level protocol that controls duplicate inflation. In the strongest reported settings, federated training matches or slightly exceeds the centralized baseline, reaching instance-level Macro-F1 of 0.8826 ± 0.0177 versus 0.8689 ± 0.0124, and hash-level Macro-F1 of 0.8908 ± 0.0220 versus 0.8841 ± 0.0078. The claim is deliberately narrow: we do not argue that federated learning is generally superior to centralized training, nor do we claim formal privacy guarantees. Rather, we show that in this duplicate-heavy Polish medical text benchmark, conclusions about locality depend strongly on evaluation hygiene.
Anthology ID:
2026.acl-srw.44
Volume:
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Month:
July
Year:
2026
Address:
San Diego, California, United States
Editors:
Santosh T.Y.S.S., Juan Diego Rodriguez, Ona de Gibert
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
498–508
Language:
URL:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.44/
DOI:
Bibkey:
Cite (ACL):
Daniel Cieślak and Andrzej Czyżewski. 2026. Does Locality Cost in Polish Medical Text Classification? Duplicate-Aware Evaluation of Federated Learning. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026), pages 498–508, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):
Does Locality Cost in Polish Medical Text Classification? Duplicate-Aware Evaluation of Federated Learning (Cieślak & Czyżewski, ACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-acl/2026.acl-srw.44.pdf