Chandresh Kumar Maurya
Also published as: Chandresh Kumar Maurya
2026
CLAOCS-TX: Cross-Lingual Triplet Extraction with Aspect-Opinion-Aware Code-Switched Prompting and LLM-Guided Contrastive Distillation
Lipika Dewangan | Chandresh Kumar Maurya
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lipika Dewangan | Chandresh Kumar Maurya
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Cross-lingual learning enables the transfer of structured sentiment knowledge from high-resource languages to unlabeled or low-resource languages, but prior work has largely focused on coarse-grained sentiment classification or aspect extraction. In contrast, zero-shot cross-lingual aspect–opinion–sentiment triplet extraction (ASTE), which extracts sentiment triplets of the form (aspect term, opinion term, sentiment polarity), remains underexplored. We propose a unified framework that leverages large language models (LLMs) as both structured pseudo-label generators and semantic teachers for ASTE. Our approach employs stepwise structured prompting over aspect- and opinion-aware code-switched variants to generate reliable pseudo triplets, followed by a multi-variant consistency filter to retain high-confidence supervision. We further introduce a triplet-aware contrastive distillation objective that aligns student triplet representations with LLM-encoded semantic embeddings. During inference, only the student ASTE model is used, without requiring LLM access. Experiments on four non-Indic and four low-resource Indic target languages show consistent improvements over strong cross-lingual and LLM-based baselines. The proposed method yields an absolute micro-F1 improvement of 5.3 points on non-Indic languages and 3.8 points on low-resource Indic languages compared to the best competing approach. Ablation results further validate the complementary roles of aspect- and opinion-aware code-switched prompting and triplet-aware contrastive distillation, with larger relative gains observed in low-resource Indic settings.
2025
Indic-S2ST: a Multilingual and Multimodal Many-to-Many Indic Speech-to-Speech Translation Dataset
Nivedita Sethiya | Puneet Walia | Chandresh Kumar Maurya
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Nivedita Sethiya | Puneet Walia | Chandresh Kumar Maurya
Proceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Speech-to-Speech Translation (S2ST) converts speech from one language to speech in a different language. While various S2ST models exist, none adequately support Indic languages, primarily due to the lack of a suitable dataset. We fill this gap by introducing Indic-S2ST, a multilingual and multimodal many-to-many S2ST data of approximately 600 hours in 14 Indic languages, including Indian-accented English. To the best of our knowledge, this is the largest data for the S2ST task with parallel speech and text in 14 scheduled Indic languages. Our data also supports Automatic Speech Recognition (ASR), Text-to-Speech (TTS) synthesis, Speech-to-Text translation (ST), and Machine Translation (MT) due to parallel speech and text alignment. Thus, our data may be useful to train a model likeMeta’s SeamlessM4T for Indic languages. We also propose Indic-S2UT, a discrete unit-based S2ST model for Indic languages. To showcase the utility of the data, we present baseline results on the Indic-S2ST data using the Indic-S2UT. The dataset and codes are available at https://github.com/Nivedita5/Indic-S2ST/blob/main/README.md.
Findings of the IWSLT 2025 Evaluation Campaign
Idris Abdulmumin | Victor Agostinelli | Tanel Alumäe | Antonios Anastasopoulos | Luisa Bentivogli | Ondřej Bojar | Claudia Borg | Fethi Bougares | Roldano Cattoni | Mauro Cettolo | Lizhong Chen | William Chen | Raj Dabre | Yannick Estève | Marcello Federico | Mark Fishel | Marco Gaido | Dávid Javorský | Marek Kasztelnik | Fortuné Kponou | Mateusz Krubiński | Tsz Kin Lam | Danni Liu | Evgeny Matusov | Chandresh Kumar Maurya | John P. McCrae | Salima Mdhaffar | Yasmin Moslem | Kenton Murray | Satoshi Nakamura | Matteo Negri | Jan Niehues | Atul Kr. Ojha | John E. Ortega | Sara Papi | Pavel Pecina | Peter Polák | Piotr Połeć | Ashwin Sankar | Beatrice Savoldi | Nivedita Sethiya | Claytone Sikasote | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Brian Thompson | Marco Turchi | Alex Waibel | Patrick Wilken | Rodolfo Zevallos | Vilém Zouhar | Maike Züfle
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
Idris Abdulmumin | Victor Agostinelli | Tanel Alumäe | Antonios Anastasopoulos | Luisa Bentivogli | Ondřej Bojar | Claudia Borg | Fethi Bougares | Roldano Cattoni | Mauro Cettolo | Lizhong Chen | William Chen | Raj Dabre | Yannick Estève | Marcello Federico | Mark Fishel | Marco Gaido | Dávid Javorský | Marek Kasztelnik | Fortuné Kponou | Mateusz Krubiński | Tsz Kin Lam | Danni Liu | Evgeny Matusov | Chandresh Kumar Maurya | John P. McCrae | Salima Mdhaffar | Yasmin Moslem | Kenton Murray | Satoshi Nakamura | Matteo Negri | Jan Niehues | Atul Kr. Ojha | John E. Ortega | Sara Papi | Pavel Pecina | Peter Polák | Piotr Połeć | Ashwin Sankar | Beatrice Savoldi | Nivedita Sethiya | Claytone Sikasote | Matthias Sperber | Sebastian Stüker | Katsuhito Sudoh | Brian Thompson | Marco Turchi | Alex Waibel | Patrick Wilken | Rodolfo Zevallos | Vilém Zouhar | Maike Züfle
Proceedings of the 22nd International Conference on Spoken Language Translation (IWSLT 2025)
This paper presents the outcomes of the shared tasks conducted at the 22nd International Workshop on Spoken Language Translation (IWSLT). The workshop addressed seven critical challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, model compression, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks garnered significant participation, with 32 teams submitting their runs. The field’s growing importance is reflected in the increasing diversity of shared task organizers and contributors to this overview paper, representing a balanced mix of industrial and academic institutions. This broad participation demonstrates the rising prominence of spoken language translation in both research and practical applications.
Search
Fix author
Co-authors
- Nivedita Sethiya 2
- Idris Abdulmumin 1
- Victor Agostinelli 1
- Tanel Alumäe 1
- Antonios Anastasopoulos 1
- Luisa Bentivogli 1
- Ondřej Bojar 1
- Claudia Borg 1
- Fethi Bougares 1
- Roldano Cattoni 1
- Mauro Cettolo 1
- Lizhong Chen 1
- William Chen 1
- Raj Dabre 1
- Lipika Dewangan 1
- Yannick Estève 1
- Marcello Federico 1
- Mark Fishel 1
- Marco Gaido 1
- Dávid Javorský 1
- Marek Kasztelnik 1
- Fortuné Kponou 1
- Mateusz Krubiński 1
- Tsz Kin Lam 1
- Danni Liu 1
- Evgeny Matusov 1
- John Philip McCrae 1
- Salima Mdhaffar 1
- Yasmin Moslem 1
- Kenton Murray 1
- Satoshi Nakamura 1
- Matteo Negri 1
- Jan Niehues 1
- Atul Kr. Ojha 1
- John E. Ortega 1
- Sara Papi 1
- Pavel Pecina 1
- Peter Polák 1
- Piotr Połeć 1
- Ashwin Sankar 1
- Beatrice Savoldi 1
- Claytone Sikasote 1
- Matthias Sperber 1
- Sebastian Stüker 1
- Katsuhito Sudoh 1
- Brian Thompson 1
- Marco Turchi 1
- Alex Waibel 1
- Puneet Walia 1
- Patrick Wilken 1
- Rodolfo Zevallos 1
- Vilém Zouhar 1
- Maike Züfle 1