Mateusz Czajka
2026
AMU at RAG4Reports 2026 Task B: A Practical Multilingual RAG Pipeline for Citation-Grounded Reports
Maciej Czajka | Piotr Jabłoński | Mateusz Czajka | Konrad Pierzyński | Krzysztof Jassem
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
Maciej Czajka | Piotr Jabłoński | Mateusz Czajka | Konrad Pierzyński | Krzysztof Jassem
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
This system paper presents AMU’s submission to RAG4Reports 2026 Task B: a practical multilingual retrieval-augmented generation pipeline for evidence-supported report generation. The system combines full-query retrieval, optional query rewriting, dense retrieval with Qdrant, cross-encoder reranking, diversity-aware context selection, and structured generation. The best submitted run uses BAAI/bge-m3 embeddings, BAAI/bge-reranker-v2-m3 reranking, and gpt-5.1 generation with medium reasoning effort, using a partial-coverage prompt strategy. On the official leaderboard, it achieved F1=0.4351, sentence_support=0.8280, and nugget_coverage=0.3403, indicating that the generated reports were well grounded but only partially comprehensive.
From Metrics to Meaning: Rule-Grounded LLM Explanations for Data Literacy in the Case of Youth Football
Tomasz Piłka | Tomasz Kuczyński | Mateusz Czajka
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Tomasz Piłka | Tomasz Kuczyński | Mateusz Czajka
Proceedings of the 21st Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2026)
Young athletes, parents, and coaches are increasingly exposed to training metrics from wearable technology, yet such metrics are difficult to interpret without contextual explanation. We present a rule-grounded data-to-text framework for supporting data literacy in youth football through concise, stakeholder-specific summaries of training sessions. A rule layer maps duration-normalised indicators to structured facts about session profile, internal intensity, speed exposure, and movement dynamics, which are then verbalised by a large language model for coaches, parents, or players. We compare direct generation from raw metrics, generation from rule-derived facts, and an augmented rule-grounded configuration, ENRICHED, that supplements validated facts with raw metrics and explicit threshold definitions. In this setting, selected open-weight models are additionally adapted using LoRA. The framework is developed using 122 anonymised player-session records from a U15 environment and evaluated on a held-out subset of ten sessions with stakeholder-oriented reference summaries. The results indicate that rule grounding improves reliability and audience adaptation compared with direct generation from raw metrics, particularly by reducing unsupported or overly strong interpretations. A school-based expert evaluation with physical education teachers further suggests that player-facing explanations in the evaluated ENRICHED setting can remain accurate, comprehensible, and practically useful. We position the framework as an interpretable data-literacy support interface for youth sport analytics.
2025
Lightweight IPIS Instruction Tuning of Bielik-7B for Gender-Inclusive Polish<—>English Translation: System Description for PolEval 2025 Task 2 (IPIS-translation)
Mateusz Czajka
Proceedings of the PolEval 2025 Workshop
Mateusz Czajka
Proceedings of the PolEval 2025 Workshop
We describe a compact but fully open-source system submitted to PolEval 2025 Task 2 (Gender-inclusive LLMs for Polish), subtask B: IPIS-translation. The goal of this subtask is gender-sensitive Polish↔English translation, including the production of gender-inclusive Polish outputs that follow specific orthographic conventions such as gender stars and slash forms. Our method performs instruction tuning of the Polish LLM Bielik-7B-Instruct using parameter-efficient LoRA adapters, with optional 4-bit NF4 quantization for single-GPU training. Samples from the Inclusive Polish Instruction Set (IPIS) are converted into a chat-style format with a task-provided gender-inclusive system prompt. Despite a deliberately lightweight tuning budget and greedy decoding, our submission placed 3rd on the hidden test B split, achieving bleu_pe = 20.7871. We detail the training and inference pipeline, discuss design choices and limitations, and outline directions for improving inclusive translation quality in Polish.