Daniel Cieślak
2026
Leakage-Aware User-Level ADHD Signal Classification from Social Media: When Graph Aggregation Helps, and When It Does Not
Daniel Cieślak | Władysław Średniawa
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Daniel Cieślak | Władysław Średniawa
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
User-level ADHD-related text classification from social media is methodologically challenging because predictions must aggregate many short posts, performance can be inflated by direct diagnostic leakage, and screening-adjacent settings require calibrated probabilities rather than discrimination alone. We introduce a leakage-aware evaluation framework organized around two controlled axes: evidence budget, i.e., the number of tweets available per user, and leakage control. Within this setup, we compare document-level transformers, strong non-graph embedding-pooling baselines, and heterogeneous graph models combining semantic tweet embeddings, psycholinguistic features, and temporal structure. The main result is regime-dependent: graph aggregation is most useful when user evidence is scarce, whereas simple embedding pooling becomes highly competitive and often slightly stronger as more evidence becomes available. Overall, the main contribution is a controlled benchmarking framework and a clearer account of when graph-based aggregation is actually beneficial.
Does Locality Cost in Polish Medical Text Classification? Duplicate-Aware Evaluation of Federated Learning
Daniel Cieślak | Andrzej Czyżewski
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Daniel Cieślak | Andrzej Czyżewski
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)
Federated learning is often framed as a practical trade-off in clinical NLP: safer data handling at the cost of lower predictive performance. We revisit this assumption in a benchmark-specific study of Polish medical text classification. A key issue is evaluation granularity: the test split contains 10,634 rows but only 670 unique normalized text hashes, with 18 inconsistent groups removed in strict grouped evaluation. We therefore compare centralized and federated training under both conventional instance-level scoring and a stricter hash-level protocol that controls duplicate inflation. In the strongest reported settings, federated training matches or slightly exceeds the centralized baseline, reaching instance-level Macro-F1 of 0.8826 ± 0.0177 versus 0.8689 ± 0.0124, and hash-level Macro-F1 of 0.8908 ± 0.0220 versus 0.8841 ± 0.0078. The claim is deliberately narrow: we do not argue that federated learning is generally superior to centralized training, nor do we claim formal privacy guarantees. Rather, we show that in this duplicate-heavy Polish medical text benchmark, conclusions about locality depend strongly on evaluation hygiene.