Alexander Fraser
Other people with similar names: Alexander Fraser
Unverified author pages with similar names: Alexander Fraser
2026
Evaluating Latin and Ancient Greek Sentence Alignment through Parallel Sentence Mining
Sebastian Reichbauer | Shu Okabe | Alexander Fraser
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Sebastian Reichbauer | Shu Okabe | Alexander Fraser
Proceedings of the 6th International Conference on Natural Language Processing for the Digital Humanities
Cross-lingual detection of intertextuality and translation in Latin and Ancient Greek through computational approaches is of great interest for classical studies.While several systems exist for parallel sentence detection, based on general multilingual or specific models for Latin–Ancient Greek, they have not been compared against each other. Therefore, we present a synthetic benchmark to evaluate the performance of language models regarding cross-lingual Ancient Greek and Latin parallel sentence mining. We first compare six language models to encode sentences and then further improve the cross-lingual alignment through post-processing, fine-tuning, and knowledge distillation. We find that the whitening transformation in combination with knowledge distillation provides excellent results. Specifically, SPhilBERTa, a trilingual language model for Ancient Greek and Latin, benefits the most from the improvements and achieves a substantial mining score of 97.6 on our benchmark.
Dial HEALTHDIAL for Advice: A Multilingual and Multi-Parallel Spoken Dialogue Dataset for Knowledge-Grounded Information Seeking
Songbo Hu | Yinhong Liu | Ej Zhou | Evgeniia Razumovskaia | Xiaobin Wang | Alexander Fraser | Ivan Vulić | Anna Korhonen
Findings of the Association for Computational Linguistics: ACL 2026
Songbo Hu | Yinhong Liu | Ej Zhou | Evgeniia Razumovskaia | Xiaobin Wang | Alexander Fraser | Ivan Vulić | Anna Korhonen
Findings of the Association for Computational Linguistics: ACL 2026
Creating spoken dialogue datasets is methodologically challenging, and these challenges are amplified when the goal is to build multilingual, multi-parallel datasets at scale. This work introduces HEALTHDIAL, a large-scale, multilingual, and multi-parallel dataset for developing and evaluating retrieval-augmented generation (RAG)–based spoken dialogue systems. The dataset comprises 6,000 information-seeking dialogues (1,500 per language) grounded in trusted content from the World Health Organization (WHO) and 163 hours of user speech recorded from native speakers of diverse dialects across four official WHO languages: Arabic, Chinese, English, and Spanish. Each speaker is annotated with demographic (e.g., gender, age) and sociolinguistic (e.g., primary language, region of origin) variables. We report benchmark results across key dialogue tasks, which reveal consistent performance disparities across languages, even among high-resource ones. To support future research, we release the dataset, a prototype system, and a toolkit for data collection and system evaluation.
Optical Character Recognition for the International Phonetic Alphabet
Shu Okabe | Dejvi Zelo | Alexander Fraser
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
Shu Okabe | Dejvi Zelo | Alexander Fraser
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
As grammar books are increasingly used as additional reference resources specifically for very low-resource languages, a significant portion comes from scans and relies on the quality of the Optical Character Recognition (OCR) tool. We focus here on a particular script used in linguistics to transcribe sounds: the International Phonetic Alphabet (IPA). We consider two data sources: actual grammar book PDFs for two languages under documentation, Japhug and Kagayanen, and a synthetically generated dataset based on Wiktionary. We compare two neural OCR frameworks, Tesseract and Calamari, and a recent large vision-language model, Qwen2.5-VL-7B, all three in an off-the-shelf setting and with fine-tuning. While their zero-shot performance is relatively poor for IPA characters in general due to character set mismatch, fine-tuning with the synthetic dataset leads to notable improvements.
NLP for Social Good: A Survey and Outlook of Challenges, Opportunities and Responsible Deployment
Antonia Karamolegkou | Angana Borah | Eunjung Cho | Sagnik Ray Choudhury | Martina Galletti | Pranav Gupta | Oana Ignat | Priyanka Kargupta | Neema Kotonya | Hemank Lamba | Sun-Joo Lee | Arushi Mangla | Ishani Mondal | Fatima Zahra Moudakir | Deniz Nazar | Poli Nemkova | Dina Pisarevskaya | Naquee Rizwan | Nazanin Sabri | Keenan Samway | Dominik Stammbach | Anna Steinberg Schulten | David Tomás | Steven R Wilson | Bowen Yi | Jessica H Zhu | Arkaitz Zubiaga | Anders Søgaard | Alexander Fraser | Zhijing Jin | Rada Mihalcea | Joel R. Tetreault | Daryna Dementieva
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Antonia Karamolegkou | Angana Borah | Eunjung Cho | Sagnik Ray Choudhury | Martina Galletti | Pranav Gupta | Oana Ignat | Priyanka Kargupta | Neema Kotonya | Hemank Lamba | Sun-Joo Lee | Arushi Mangla | Ishani Mondal | Fatima Zahra Moudakir | Deniz Nazar | Poli Nemkova | Dina Pisarevskaya | Naquee Rizwan | Nazanin Sabri | Keenan Samway | Dominik Stammbach | Anna Steinberg Schulten | David Tomás | Steven R Wilson | Bowen Yi | Jessica H Zhu | Arkaitz Zubiaga | Anders Søgaard | Alexander Fraser | Zhijing Jin | Rada Mihalcea | Joel R. Tetreault | Daryna Dementieva
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Natural language processing (NLP) now shapes many aspects of our world, yet its potential for positive social impact is underexplored. This paper surveys work in “NLP for Social Good" (NLP4SG) across nine domains relevant to global development and risk agendas, summarizing principal tasks and challenges. We analyze ACL Anthology trends, finding that inclusion and AI harms attract the most research, while domains such as poverty, peacebuilding, and environmental protection remain underexplored. Guided by our review, we outline opportunities for responsible and equitable NLP and conclude with a call for cross-disciplinary partnerships and human-centered approaches to ensure that future NLP technologies advance the public good.
Search
Fix author
Co-authors
- Shu Okabe 2
- Angana Borah 1
- Eunjung Cho 1
- Sagnik Ray Choudhury 1
- Daryna Dementieva 1
- Martina Galletti 1
- Pranav Gupta 1
- Songbo Hu 1
- Oana Ignat 1
- Zhijing Jin 1
- Antonia Karamolegkou 1
- Priyanka Kargupta 1
- Anna Korhonen 1
- Neema Kotonya 1
- Hemank Lamba 1
- Sun-Joo Lee 1
- Yinhong Liu 1
- Arushi Mangla 1
- Rada Mihalcea 1
- Ishani Mondal 1
- Fatima Zahra Moudakir 1
- Deniz Nazar 1
- Poli Nemkova 1
- Dina Pisarevskaya 1
- Evgeniia Razumovskaia 1
- Sebastian Reichbauer 1
- Naquee Rizwan 1
- Nazanin Sabri 1
- Keenan Samway 1
- Anna Steinberg Schulten 1
- Dominik Stammbach 1
- Anders Søgaard 1
- Joel Tetreault 1
- David Tomás 1
- Ivan Vulić 1
- Xiaobin Wang 1
- Steven R Wilson 1
- Bowen Yi 1
- Dejvi Zelo 1
- Ej Zhou 1
- Jessica H Zhu 1
- Arkaitz Zubiaga 1