Gerardo Ocampo Diaz

2024

pdf abs
Measuring Cross-Text Cohesion for Segmentation Similarity Scoring
Gerardo Ocampo Diaz | Jessica Ouyang
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Text segmentation is the task of dividing a sequence of text elements (eg. words, sentences, or paragraphs) into meaningful chunks. Although exciting advances are being made in modern segmentation-based tasks, such as automatically generating podcast chapters, current segmentation similarity metrics share a critical weakness: they are content-agnostic. In this paper, we present a word-embedding-based metric of cross-textual cohesion based on the formal linguistic definition of cohesion and incorporate it into a new segmentation similarity metric, SED. Our similarity metric, SED, is capable of providing fine-grained segmentation similarity scoring for the 3 basic segmentation errors: transposition, insertion, and deletion, as well as mixtures of them, avoiding the limitations of traditional metrics. We discuss the benefits of SED and evaluate its alignment with human judgement for each of the 3 basic error types. We show that our metric aligns with human evaluations significantly more than traditional metrics. We briefly discuss future work, such as the integration of anaphora resolution into our cohesion-based metric, and make our code publicly available.

2022

pdf abs
An Alignment-based Approach to Text Segmentation Similarity Scoring
Gerardo Ocampo Diaz | Jessica Ouyang
Proceedings of the 26th Conference on Computational Natural Language Learning (CoNLL)

Text segmentation is a natural language processing task with popular applications, such as topic segmentation, element discourse extraction, and sentence tokenization. Much work has been done to develop accurate segmentation similarity metrics, but even the most advanced metrics used today, B, and WindowDiff, exhibit incorrect behavior due to their evaluation of boundaries in isolation. In this paper, we present a new segment-alignment based approach to segmentation similarity scoring and a new similarity metric A. We show that A does not exhibit the erratic behavior of $ and WindowDiff, quantify the likelihood of B and WindowDiff misbehaving through simulation, and discuss the versatility of alignment-based approaches for segmentation similarity scoring. We make our implementation of A publicly available and encourage the community to explore more sophisticated approaches to text segmentation similarity scoring.

2020

pdf abs
Aspect-Based Sentiment Analysis as Fine-Grained Opinion Mining
Gerardo Ocampo Diaz | Xuanming Zhang | Vincent Ng
Proceedings of the Twelfth Language Resources and Evaluation Conference

We show how the general fine-grained opinion mining concepts of opinion target and opinion expression are related to aspect-based sentiment analysis (ABSA) and discuss their benefits for resource creation over popular ABSA annotation schemes. Specifically, we first discuss why opinions modeled solely in terms of (entity, aspect) pairs inadequately captures the meaning of the sentiment originally expressed by authors and how opinion expressions and opinion targets can be used to avoid the loss of information. We then design a meaning-preserving annotation scheme and apply it to two popular ABSA datasets, the 2016 SemEval ABSA Restaurant and Laptop datasets. Finally, we discuss the importance of opinion expressions and opinion targets for next-generation ABSA systems. We make our datasets publicly available for download.

2018

pdf abs
Modeling and Prediction of Online Product Review Helpfulness: A Survey
Gerardo Ocampo Diaz | Vincent Ng
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

As the amount of free-form user-generated reviews in e-commerce websites continues to increase, there is an increasing need for automatic mechanisms that sift through the vast amounts of user reviews and identify quality content. Review helpfulness modeling is a task which studies the mechanisms that affect review helpfulness and attempts to accurately predict it. This paper provides an overview of the most relevant work in helpfulness prediction and understanding in the past decade, discusses the insights gained from said work, and provides guidelines for future research.