Homaira Huda Shomee


2025

pdf bib
A Survey on Patent Analysis: From NLP to Multimodal AI
Homaira Huda Shomee | Zhu Wang | Sathya N. Ravi | Sourav Medya
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Recent advances in Pretrained Language Models (PLMs) and Large Language Models (LLMs) have demonstrated transformative capabilities across diverse domains. The field of patent analysis and innovation is not an exception, where natural language processing (NLP) techniques presents opportunities to streamline and enhance important tasks—such as patent classification and patent retrieval—in the patent cycle. This not only accelerates the efficiency of patent researchers and applicants, but also opens new avenues for technological innovation and discovery. Our survey provides a comprehensive summary of recent NLP-based methods—including multimodal ones—in patent analysis. We also introduce a novel taxonomy for categorization based on tasks in the patent life cycle, as well as the specifics of the methods. This interdisciplinary survey aims to serve as a comprehensive resource for researchers and practitioners who work at the intersection of NLP, Multimodal AI, and patent analysis, as well as patent offices to build efficient patent systems.

pdf bib
DesignCLIP: Multimodal Learning with CLIP for Design Patent Understanding
Zhu Wang | Homaira Huda Shomee | Sathya N. Ravi | Sourav Medya
Findings of the Association for Computational Linguistics: EMNLP 2025

In the field of design patent analysis, traditional tasks such as patent classification and patent image retrieval heavily depend on the image data. However, patent images—typically consisting of sketches with abstract and structural elements of an invention—often fall short in conveying comprehensive visual context and semantic information. This inadequacy can lead to ambiguities in evaluation during prior art searches. Recent advancements in vision-language models, such as CLIP, offer promising opportunities for more reliable and accurate AI-driven patent analysis. In this work, we leverage CLIP models to develop a unified framework DesignCLIP for design patent applications with a large-scale dataset of U.S. design patents. To address the unique characteristics of patent data, DesignCLIP incorporates class-aware classification and contrastive learning, utilizing generated detailed captions for patent images and multi-views image learning. We validate the effectiveness of DesignCLIP across various downstream tasks, including patent classification and patent retrieval. Additionally, we explore multimodal patent retrieval, which provides the potential to enhance creativity and innovation in design by offering more diverse sources of inspiration. Our experiments show that DesignCLIP consistently outperforms baseline and SOTA models in the patent domain on all tasks. Our findings underscore the promise of multimodal approaches in advancing patent analysis. The codebase is available here: https://github.com/AI4Patents/DesignCLIP.