Huihui Xu


2025

pdf bib
Label-Free Distinctiveness: Building a Continuous Trademark Scale via Synthetic Anchors
Huihui Xu | Kevin D. Ashley
Proceedings of the Natural Legal Language Processing Workshop 2025

Trademark law protects distinctive marks that are able to identify and distinguish goods or services. The Abercrombie spectrum classifies marks from generic to fanciful based on distinctiveness. The Abercrombie spectrum employs hard buckets while the real world ofbranding rarely falls into neat bins: marks often hover at the blurry border between “descriptive” and “suggestive” for example. Byrequiring trademark examiners or researchers to pick one of the five buckets, one loses useful information where the lines get blurry. Sohard boundaries obscure valuable gradations of meaning. In this work, we explore creating a continuous ruler of distinctiveness asa complementary diagnostic tool to the original buckets. The result is a label-free ladder, where every mark, real or synthetic, gets a real-valued score. These continuous scores reveal subtle distinctions among marks and provide interpretable visualizations that help practitioners understand where a mark falls relative to established anchors. Testing with 95 expert-classified trademark examples achieves a Spearman’s ρ = 0.718 and Pearson’s r = 0.724 against human labels, while offering intuitive visualizations on the continuous spectrum. Ademo can be found at https://distinctiveness-ruler-demo.streamlit.app/.

2024

pdf bib
Adding Argumentation into Human Evaluation of Long Document Abstractive Summarization: A Case Study on Legal Opinions
Mohamed Elaraby | Huihui Xu | Morgan Gray | Kevin Ashley | Diane Litman
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval) @ LREC-COLING 2024

Human evaluation remains the gold standard for assessing abstractive summarization. However, current practices often prioritize constructing evaluation guidelines for fluency, coherence, and factual accuracy, overlooking other critical dimensions. In this paper, we investigate argument coverage in abstractive summarization by focusing on long legal opinions, where summaries must effectively encapsulate the document’s argumentative nature. We introduce a set of human-evaluation guidelines to evaluate generated summaries based on argumentative coverage. These guidelines enable us to assess three distinct summarization models, studying the influence of including argument roles in summarization. Furthermore, we utilize these evaluation scores to benchmark automatic summarization metrics against argument coverage, providing insights into the effectiveness of automated evaluation methods.