Faysal Fateh


2025

pdf bib
Restaurant Menu Categorization at Scale: LLM-Guided Hybrid Clustering
Seemab Latif | Ashar Mehmood | Selim Turki | Huma Ameer | Ivan Gorban | Faysal Fateh
Proceedings of the 18th International Natural Language Generation Conference

Inconsistent naming of menu items across merchants presents a major challenge for businesses that rely on large-scale menu item catalogs. It hinders downstream tasks like pricing analysis, menu item deduplication, and recommendations. To address this, we propose the Cross-Platform Semantic Alignment Framework (CPSAF), a hybrid approach that integrates DBSCAN-based clustering with SIGMA (Semantic Item Grouping and Menu Abstraction), a Large Language Model based refinement module. SIGMA employs in-context learning with a large language model to generate generic menu item names and categories. We evaluate our framework on a proprietary dataset comprising over 700,000 unique menu items. Experiments involve tuning DBSCAN parameters and applying SIGMA to refine clusters. The performance is assessed using both structural metrics i.e. cluster count, coverage and semantic metrics i.e. intra and inter-cluster similarity along with manual qualitative inspection. CPSAF improves intra-cluster similarity from 0.88 to 0.98 and reduces singleton clusters by 33%, demonstrating its effectiveness in recovering soft semantic drift.