2024
pdf
bib
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
Chung-Chi Chen
|
Xiaomo Liu
|
Udo Hahn
|
Armineh Nourbakhsh
|
Zhiqiang Ma
|
Charese Smiley
|
Veronique Hoste
|
Sanjiv Ranjan Das
|
Manling Li
|
Mohammad Ghassemi
|
Hen-Hsen Huang
|
Hiroya Takamura
|
Hsin-Hsi Chen
Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing
pdf
bib
abs
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
|
Zhiqiang Ma
|
Armineh Nourbakhsh
|
Sameena Shah
Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)
Visually Rich Form Understanding (VRFU) poses a complex research problemdue to the documents’ highly structured nature and yet highly variable style and content. Current annotation schemes decompose form understanding and omit key hierarchical structure, making development and evaluation of end-to-end models difficult. In this paper, we propose a novel F1 metric to evaluate form parsers and describe a new content-agnostic, tree-based annotation scheme for VRFU: TreeForm. We provide methods to convert previous annotation schemes into TreeForm structures and evaluate TreeForm predictions using a modified version of the normalized tree-edit distance. We present initial baselines for our end-to-end performance metric and the TreeForm edit distance, averaged over the FUNSD and XFUND datasets, of 61.5 and 26.4 respectively. We hope that TreeForm encourages deeper research in annotating, modeling, and evaluating the complexities of form-like documents.
pdf
abs
Towards a new research agenda for multimodal enterprise document understanding: What are we missing?
Armineh Nourbakhsh
|
Sameena Shah
|
Carolyn Rose
Findings of the Association for Computational Linguistics ACL 2024
The field of multimodal document understanding has produced a suite of models that have achieved stellar performance across several tasks, even coming close to human performance on certain benchmarks. Nevertheless, the application of these models to real-world enterprise datasets remains constrained by a number of limitations. In this position paper, we discuss these limitations in the context of three key aspects of research: dataset curation, model development, and evaluation on downstream tasks. By analyzing 14 datasets and 7 SotA models, we identify major gaps in their utility in the context of a real-world scenario. We demonstrate how each limitation impedes the widespread use of SotA models in enterprise settings, and present a set of research challenges that are motivated by these limitations. Lastly, we propose a research agenda that is aimed at driving the field towards higher impact in enterprise applications.
pdf
abs
DocLLM: A Layout-Aware Generative Language Model for Multimodal Document Understanding
Dongsheng Wang
|
Natraj Raman
|
Mathieu Sibue
|
Zhiqiang Ma
|
Petr Babkin
|
Simerjot Kaur
|
Yulong Pei
|
Armineh Nourbakhsh
|
Xiaomo Liu
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Enterprise documents such as forms, receipts, reports, and other such records, often carry rich semantics at the intersection of textual and spatial modalities. The visual cues offered by their complex layouts play a crucial role in comprehending these documents effectively. In this paper, we present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout. Our model differs from existing multimodal LLMs by avoiding expensive image encoders and focuses exclusively on bounding box information to incorporate the spatial layout structure. Specifically, the cross-alignment between text and spatial modalities is captured by decomposing the attention mechanism in classical transformers to a set of disentangled matrices. Furthermore, we devise a pre-training objective that learns to infill text segments. This approach allows us to address irregular layouts and heterogeneous content frequently encountered in visual documents. The pre-trained model is fine-tuned using a large-scale instruction dataset, covering four core document intelligence tasks. We demonstrate that our solution outperforms SotA LLMs on 14 out of 16 datasets across all tasks, and generalizes well to 4 out of 5 previously unseen datasets.
2023
pdf
abs
Using counterfactual contrast to improve compositional generalization for multi-step quantitative reasoning
Armineh Nourbakhsh
|
Sameena Shah
|
Carolyn Rosé
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
In quantitative question answering, compositional generalization is one of the main challenges of state of the art models, especially when longer sequences of reasoning steps are required. In this paper we propose CounterComp, a method that uses counterfactual scenarios to generate samples with compositional contrast. Instead of a data augmentation approach, CounterComp is based on metric learning, which allows for direct sampling from the training set and circumvents the need for additional human labels. Our proposed auxiliary metric learning loss improves the performance of three state of the art models on four recently released datasets. We also show how the approach can improve OOD performance on unseen domains, as well as unseen compositions. Lastly, we demonstrate how the method can lead to better compositional attention patterns during training.
2022
pdf
abs
Improving compositional generalization for multi-step quantitative reasoning in question answering
Armineh Nourbakhsh
|
Cathy Jiao
|
Sameena Shah
|
Carolyn Rosé
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Quantitative reasoning is an important aspect of question answering, especially when numeric and verbal cues interact to indicate sophisticated, multi-step programs. In this paper, we demonstrate how modeling the compositional nature of quantitative text can enhance the performance and robustness of QA models, allowing them to capture arithmetic logic that is expressed verbally. Borrowing from the literature on semantic parsing, we propose a method that encourages the QA models to adjust their attention patterns and capture input/output alignments that are meaningful to the reasoning task. We show how this strategy improves program accuracy and renders the models more robust against overfitting as the number of reasoning steps grows. Our approach is designed as a standalone module which can be prepended to many existing models and trained in an end-to-end fashion without the need for additional supervisory signal. As part of this exercise, we also create a unified dataset building on four previously released numerical QA datasets over tabular data.
2017
pdf
abs
funSentiment at SemEval-2017 Task 4: Topic-Based Message Sentiment Classification by Exploiting Word Embeddings, Text Features and Target Contexts
Quanzhi Li
|
Armineh Nourbakhsh
|
Xiaomo Liu
|
Rui Fang
|
Sameena Shah
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
This paper describes the approach we used for SemEval-2017 Task 4: Sentiment Analysis in Twitter. Topic-based (target-dependent) sentiment analysis has become attractive and been used in some applications recently, but it is still a challenging research task. In our approach, we take the left and right context of a target into consideration when generating polarity classification features. We use two types of word embeddings in our classifiers: the general word embeddings learned from 200 million tweets, and sentiment-specific word embeddings learned from 10 million tweets using distance supervision. We also incorporate a text feature model in our algorithm. This model produces features based on text negation, tf.idf weighting scheme, and a Rocchio text classification method. We participated in four subtasks (B, C, D & E for English), all of which are about topic-based message polarity classification. Our team is ranked #6 in subtask B, #3 by MAEu and #9 by MAEm in subtask C, #3 using RAE and #6 using KLD in subtask D, and #3 in subtask E.
pdf
abs
funSentiment at SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs Using Word Vectors Built from StockTwits and Twitter
Quanzhi Li
|
Sameena Shah
|
Armineh Nourbakhsh
|
Rui Fang
|
Xiaomo Liu
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
This paper describes the approach we used for SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs. We use three types of word embeddings in our algorithm: word embeddings learned from 200 million tweets, sentiment-specific word embeddings learned from 10 million tweets using distance supervision, and word embeddings learned from 20 million StockTwits messages. In our approach, we also take the left and right context of the target company into consideration when generating polarity prediction features. All the features generated from different word embeddings and contexts are integrated together to train our algorithm
2016
pdf
Witness Identification in Twitter
Rui Fang
|
Armineh Nourbakhsh
|
Xiaomo Liu
|
Sameena Shah
|
Quanzhi Li
Proceedings of the Fourth International Workshop on Natural Language Processing for Social Media