Domain Representative Keywords Selection: A Probabilistic Approach

Pritom Saha Akash, Jie Huang, Kevin Chang, Yunyao Li, Lucian Popa, ChengXiang Zhai


Abstract
We propose a probabilistic approach to select a subset of a target domain representative keywords from a candidate set, contrasting with a context domain. Such a task is crucial for many downstream tasks in natural language processing. To contrast the target domain and the context domain, we adapt the two-component mixture model concept to generate a distribution of candidate keywords. It provides more importance to the distinctive keywords of the target domain than common keywords contrasting with the context domain. To support the representativeness of the selected keywords towards the target domain, we introduce an optimization algorithm for selecting the subset from the generated candidate distribution. We have shown that the optimization algorithm can be efficiently implemented with a near-optimal approximation guarantee. Finally, extensive experiments on multiple domains demonstrate the superiority of our approach over other baselines for the tasks of keyword summary generation and trending keywords selection.
Anthology ID:
2022.findings-acl.56
Volume:
Findings of the Association for Computational Linguistics: ACL 2022
Month:
May
Year:
2022
Address:
Dublin, Ireland
Venue:
Findings
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
679–692
Language:
URL:
https://aclanthology.org/2022.findings-acl.56
DOI:
10.18653/v1/2022.findings-acl.56
Bibkey:
Cite (ACL):
Pritom Saha Akash, Jie Huang, Kevin Chang, Yunyao Li, Lucian Popa, and ChengXiang Zhai. 2022. Domain Representative Keywords Selection: A Probabilistic Approach. In Findings of the Association for Computational Linguistics: ACL 2022, pages 679–692, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Domain Representative Keywords Selection: A Probabilistic Approach (Akash et al., Findings 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2022.findings-acl.56.pdf
Code
 pritomsaha/keyword-selection
Data
AMiner