Maria Mahbub
2026
ORCHID: Orchestrated Retrieval-Augmented Classification of High-Risk Property with Intelligent Decision-Making
Sanjay Das | Maria Mahbub | Vanessa Lama | Brian Starks | Christopher Polchek | Saffell Silvers | Lauren Deck | Prasanna Balaprakash | Robert M. Patton | Tirthankar Ghosal
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
Sanjay Das | Maria Mahbub | Vanessa Lama | Brian Starks | Christopher Polchek | Saffell Silvers | Lauren Deck | Prasanna Balaprakash | Robert M. Patton | Tirthankar Ghosal
Proceedings of the 1st Workshop on Multilingual Report Generation via Retrieval Augmented Generation (RAG4Reports 2026)
High-Risk Property (HRP) classification is critical at U.S. Department of Energy (DOE) sites, where inventories include sensitive and often dual-use equipment. Compliance must track evolving rules designated by various export control policies to make transparent and auditable decisions. Traditional expert-only workflows are time-consuming, backlog-prone, and struggle to keep pace with shifting regulatory boundaries. We propose ORCHID, a modular agentic framework for HRP classification that pairs retrieval-augmented generation (RAG) with human oversight to produce policy based outputs that can be audited. Small cooperating agents—retrieval, description refiner, classifier, validator, and feedback logger—coordinate via agent-to-agent messaging and invoke tools through the Model Context Protocol (MCP) for model-agnostic on-premise operation. The interface follows an "Item to Evidence to Decision" loop with step-by-step reasoning, on-policy citations, and append-only audit bundles (run-cards, prompts, evidence). In preliminary tests on real HRP cases, ORCHID improves accuracy and traceability over a non-agentic baseline while deferring uncertain items to Subject Matter Experts (SMEs). The demonstration shows single item submission, grounded citations, SME feedback capture, and exportable audit artifacts—illustrating a practical path to trustworthy LLM assistance in sensitive DOE compliance workflows.
2024
Rosetta Balcanica: Deriving a “Gold Standard” Neural Machine Translation (NMT) Parallel Dataset from High-Fidelity Resources for Western Balkan Languages
Edmon Begoli | Maria Mahbub | Sudarshan Srinivasan
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
Edmon Begoli | Maria Mahbub | Sudarshan Srinivasan
Proceedings of the Seventh Workshop on Technologies for Machine Translation of Low-Resource Languages (LoResMT 2024)
The Rosetta Balcanica is an ongoing effort in resource expansion for low-resource Western Balkans languages. This effort focuses on discovering and using accurately translated, officially mapped, and curated parallel language resources and their preparation and use as neural machine translation (NMT) datasets. Some of the guiding principles, practices, and methods employed by Rosetta Balcanica are generalizable and could apply to other low-resource language resource expansion efforts. With this goal in mind, we present our rationale and approach to discovering and using meticulously translated and officially curated low-resource language resources and our use of these resources to develop a parallel “gold standard” translation training resource. Secondly, we describe our specific methodology for NMT dataset development from these resources and its publication to a widely-used and accessible repository for natural language processing (Hugging Face Hub). Finally, we discuss the trade-offs and limitations of our current approach, and the roadmap for future development and the expansion of the current Rosetta Balcanica language resource.