2025
pdf
bib
abs
CLERC: A Dataset for U. S. Legal Case Retrieval and Retrieval-Augmented Analysis Generation
Abe Bohan Hou
|
Orion Weller
|
Guanghui Qin
|
Eugene Yang
|
Dawn Lawrie
|
Nils Holzenberger
|
Andrew Blair-Stanek
|
Benjamin Van Durme
Findings of the Association for Computational Linguistics: NAACL 2025
Legal professionals need to write analyses that rely on citations to relevant precedents, i.e., previous case decisions. Intelligence systems assisting legal professionals in writing such documents provide great benefits but are challenging to design. Such systems need to help locate, summarize, and reason over salient precedents in order to be useful. To enable systems for such tasks, we work with legal professionals to create a colossal dataset. supporting two important backbone tasks: information retrieval (IR) and retrieval-augmented generation (RAG). This dataset **CLERC** (Case Law Evaluation and Retrieval Corpus), is constructed for training and evaluating models on their ability to (1) find corresponding citations for a given piece of legal analysis and to (2) compile the text of these citations (as well as previous context) into a cogent analysis that supports a reasoning goal. We benchmark state-of-the-art models on CLERC, showing that current approaches still struggle: GPT-4o generates analyses with the highest ROUGE F-scores but hallucinates the most, while zero-shot IR models only achieve 48.3% recall@1000.
2021
pdf
bib
abs
ToxCCIn: Toxic Content Classification with Interpretability
Tong Xiang
|
Sean MacAvaney
|
Eugene Yang
|
Nazli Goharian
Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis
Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans. Explanations are particularly important for tasks like offensive language or toxicity detection on social media because a manual appeal process is often in place to dispute automatically flagged content. In this work, we propose a technique to improve the interpretability of these models, based on a simple and powerful assumption: a post is at least as toxic as its most toxic span. We incorporate this assumption into transformer models by scoring a post based on the maximum toxicity of its spans and augmenting the training process to identify correct spans. We find this approach effective and can produce explanations that exceed the quality of those provided by Logistic Regression analysis (often regarded as a highly-interpretable model), according to a human study.
2020
pdf
bib
abs
GUIR at SemEval-2020 Task 12: Domain-Tuned Contextualized Models for Offensive Language Detection
Sajad Sotudeh
|
Tong Xiang
|
Hao-Ren Yao
|
Sean MacAvaney
|
Eugene Yang
|
Nazli Goharian
|
Ophir Frieder
Proceedings of the Fourteenth Workshop on Semantic Evaluation
Offensive language detection is an important and challenging task in natural language processing. We present our submissions to the OffensEval 2020 shared task, which includes three English sub-tasks: identifying the presence of offensive language (Sub-task A), identifying the presence of target in offensive language (Sub-task B), and identifying the categories of the target (Sub-task C). Our experiments explore using a domain-tuned contextualized language model (namely, BERT) for this task. We also experiment with different components and configurations (e.g., a multi-view SVM) stacked upon BERT models for specific sub-tasks. Our submissions achieve F1 scores of 91.7% in Sub-task A, 66.5% in Sub-task B, and 63.2% in Sub-task C. We perform an ablation study which reveals that domain tuning considerably improves the classification performance. Furthermore, error analysis shows common misclassification errors made by our model and outlines research directions for future.
2015
pdf
bib
Identifying Political Sentiment between Nation States with Social Media
Nathanael Chambers
|
Victor Bowen
|
Ethan Genco
|
Xisen Tian
|
Eric Young
|
Ganesh Harihara
|
Eugene Yang
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
2013
pdf
bib
USNA: A Dual-Classifier Approach to Contextual Sentiment Analysis
Ganesh Harihara
|
Eugene Yang
|
Nathanael Chambers
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013)