2025
pdf
bib
abs
Neural Document Segmentation Using Weighted Sliding Windows with Transformer Encoders
Saeed Abbasi
|
Aijun An
|
Heidar Davoudi
|
Ron Di Carlantonio
|
Gary Farmaner
Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
We introduce a novel Transformer-based method for document segmentation, tailored for practical, real-world applications. This method utilizes overlapping text sequences with a unique position-aware weighting mechanism to enhance segmentation accuracy. Through comprehensive experiments on both public and proprietary datasets, we demonstrate significant improvements, establishing new state-of-the-art standards by achieving up to a 10% increase in segmentation F1 score compared to existing methods. Additionally, we explore the application of our segmentation method in downstream retrieval-augmented question answering tasks, where it improves the quality of generated responses by 5% while achieving up to four times greater efficiency. These results underscore our model’s potential as a robust and scalable solution for real-world text segmentation challenges.
pdf
bib
abs
Generating Spatial Knowledge Graphs from Automotive Diagrams for Question Answering
Steve Bakos
|
Chen Xing
|
Heidar Davoudi
|
Aijun An
|
Ron DiCarlantonio
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Answering “Where is the X button?” with “It’s next to the Y button” is unhelpful if the user knows neither location. Useful answers require obvious landmarks as a reference point. We address this by generating from a vehicle dashboard diagram a spatial knowledge graph (SKG) that shows the spatial relationship between a dashboard component and its nearby landmarks and using the SKG to help answer questions. We evaluate three distinct generation pipelines (Per-Attribute, Per-Component, and a Single-Shot baseline) to create the SKG using Large Vision-Language Models (LVLMs). On a new 65-vehicle dataset, we demonstrate that a decomposed Per-Component pipeline is the most effective strategy for generating a high-quality SKG; the graph produced by this method, when evaluated with a novel Significance score, identifies landmarks achieving 71.3% agreement with human annotators. This work enables downstream QA systems to provide more intuitive, landmark-based answers.
pdf
bib
abs
GEAR: A Scalable and Interpretable Evaluation Framework for RAG-Based Car Assistant Systems
Niloufar Beyranvand
|
Hamidreza Dastmalchi
|
Aijun An
|
Heidar Davoudi
|
Winston Chan
|
Ron DiCarlantonio
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Large language models (LLMs) increasingly power car assistants, enabling natural language interaction for tasks such as maintenance, troubleshooting, and operational guidance. While retrieval-augmented generation (RAG) improves grounding using vehicle manuals, evaluating response quality remains a key challenge. Traditional metrics like BLEU and ROUGE fail to capture critical aspects such as factual accuracy and information coverage. We propose GEAR, a fully automated, reference-based evaluation framework for car assistant systems. GEAR uses LLMs as evaluators to compare assistant responses against ground-truth counterparts, assessing coverage, correctness, and other dimensions of answer quality. To enable fine-grained evaluation, both responses are decomposed into key facts and labeled as essential, optional, or safety-critical using LLMs. The evaluator then determines which of these facts are correct and covered. Experiments show that GEAR aligns closely with human annotations, offering a scalable and reliable solution for evaluating car assistants.
2024
pdf
bib
abs
Generating Vehicular Icon Descriptions and Indications Using Large Vision-Language Models
James Fletcher
|
Nicholas Dehnen
|
Seyed Nima Tayarani Bathaie
|
Aijun An
|
Heidar Davoudi
|
Ron DiCarlantonio
|
Gary Farmaner
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track
To enhance a question-answering system for automotive drivers, we tackle the problem of automatic generation of icon image descriptions. The descriptions can match the driver’s query about the icon appearing on the dashboard and tell the driver what is happening so that they may take an appropriate action. We use three state-of-the-art large vision-language models to generate both visual and functional descriptions based on the icon image and its context information in the car manual. Both zero-shot and few-shot prompts are used. We create a dataset containing over 400 icons with their ground-truth descriptions and use it to evaluate model-generated descriptions across several performance metrics. Our evaluation shows that two of these models (GPT-4o and Claude 3.5) performed well on this task, while the third model (LLaVA-NEXT) performs poorly.
2023
pdf
bib
abs
Question Generation Using Sequence-to-Sequence Model with Semantic Role Labels
Alireza Naeiji
|
Aijun An
|
Heidar Davoudi
|
Marjan Delpisheh
|
Muath Alzghool
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics
Automatic generation of questions from text has gained increasing attention due to its useful applications. We propose a novel question generation method that combines the benefits of rule-based and neural sequence-to-sequence (Seq2Seq) models. The proposed method can automatically generate multiple questions from an input sentence covering different views of the sentence as in rule-based methods, while more complicated “rules” can be learned via the Seq2Seq model. The method utilizes semantic role labeling to convert training examples into their semantic representations, and then trains a Seq2Seq model over the semantic representations. Our extensive experiments on three real-world data sets show that the proposed method significantly improves the state-of-the-art neural question generation approaches.
2020
pdf
bib
abs
Affective and Contextual Embedding for Sarcasm Detection
Nastaran Babanejad
|
Heidar Davoudi
|
Aijun An
|
Manos Papagelis
Proceedings of the 28th International Conference on Computational Linguistics
Automatic sarcasm detection from text is an important classification task that can help identify the actual sentiment in user-generated data, such as reviews or tweets. Despite its usefulness, sarcasm detection remains a challenging task, due to a lack of any vocal intonation or facial gestures in textual data. To date, most of the approaches to addressing the problem have relied on hand-crafted affect features, or pre-trained models of non-contextual word embeddings, such as Word2vec. However, these models inherit limitations that render them inadequate for the task of sarcasm detection. In this paper, we propose two novel deep neural network models for sarcasm detection, namely ACE 1 and ACE 2. Given as input a text passage, the models predict whether it is sarcastic (or not). Our models extend the architecture of BERT by incorporating both affective and contextual features. To the best of our knowledge, this is the first attempt to directly alter BERT’s architecture and train it from scratch to build a sarcasm classifier. Extensive experiments on different datasets demonstrate that the proposed models outperform state-of-the-art models for sarcasm detection with significant margins.
2019
pdf
bib
abs
Content-based Dwell Time Engagement Prediction Model for News Articles
Heidar Davoudi
|
Aijun An
|
Gordon Edall
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Industry Papers)
The article dwell time (i.e., expected time that users spend on an article) is among the most important factors showing the article engagement. It is of great interest to predict the dwell time of an article before its release. This allows digital newspapers to make informed decisions and publish more engaging articles. In this paper, we propose a novel content-based approach based on a deep neural network architecture for predicting article dwell times. The proposed model extracts emotion, event and entity features from an article, learns interactions among them, and combines the interactions with the word-based features of the article to learn a model for predicting the dwell time. The experimental results on a real dataset from a major newspaper show that the proposed model outperforms other state-of-the-art baselines.