Ian Soboroff


2025

pdf bib
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
Somnath Banerjee | Avik Halder | Rajarshi Mandal | Sayan Layek | Ian Soboroff | Rima Hazra | Animesh Mukherjee
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

Pretrained language models (PLMs) have revolutionized NLP but amplify linguistic inequities in multilingual applications. While prior studies focused on transformer architectures such as BERT, we evaluate large language models (LLMs) including Mistral, TowerInstruct, OpenHathi, Tamil-Llama, and Kan-Llama. Through rigorous testing across eight languages spanning high-resource (English, German, French, Italian, Spanish) and low-resource (Hindi, Tamil, Kannada) settings, we reveal systemic failures in preserving multilingual reliability and adaptability. Using paradigms like each language for itself’ (ELFI) and each language for others’ (ELFO), we highlight the inability of current LLMs to bridge linguistic divides. Even model merging fail to mitigate these gaps, exposing fundamental limitations. These findings emphasize the critical need for reimagining AI architectures to deliver true linguistic inclusivity and equitable performance across diverse languages.

2022

pdf bib
Development of a MultiModal Annotation Framework and Dataset for Deep Video Understanding
Erika Loc | Keith Curtis | George Awad | Shahzad Rajput | Ian Soboroff
Proceedings of the 2nd Workshop on People in Vision, Language, and the Mind

In this paper we introduce our approach and methods for collecting and annotating a new dataset for deep video understanding. The proposed dataset is composed of 3 seasons (15 episodes) of the BBC Land Girls TV Series in addition to 14 Creative Common movies with total duration of 28.5 hr. The main contribution of this paper is a novel annotation framework on the movie and scene levels to support an automatic query generation process that can capture the high-level movie features (e.g. how characters and locations are related to each other) as well as fine grained scene-level features (e.g. character interactions, natural language descriptions, and sentiments). Movie-level annotations include constructing a global static knowledge graph (KG) to capture major relationships, while the scene-level annotations include constructing a sequence of knowledge graphs (KGs) to capture fine-grained features. The annotation framework supports generating multiple query types. The objective of the framework is to provide a guide to annotating long duration videos to support tasks and challenges in the video and multimedia understanding domains. These tasks and challenges can support testing automatic systems on their ability to learn and comprehend a movie or long video in terms of actors, entities, events, interactions and their relationship to each other.

2005

pdf bib
Novelty Detection: The TREC Experience
Ian Soboroff | Donna Harman
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing