Abstract
We present a new model for acquiring comprehensive multiword lexicons from large corpora based on competition among n-gram candidates. In contrast to the standard approach of simple ranking by association measure, in our model n-grams are arranged in a lattice structure based on subsumption and overlap relationships, with nodes inhibiting other nodes in their vicinity when they are selected as a lexical item. We show how the configuration of such a lattice can be optimized tractably, and demonstrate using annotations of sampled n-grams that our method consistently outperforms alternatives by at least 0.05 F-score across several corpora and languages.- Anthology ID:
 - Q17-1032
 - Volume:
 - Transactions of the Association for Computational Linguistics, Volume 5
 - Month:
 - Year:
 - 2017
 - Address:
 - Cambridge, MA
 - Editors:
 - Lillian Lee, Mark Johnson, Kristina Toutanova
 - Venue:
 - TACL
 - SIG:
 - Publisher:
 - MIT Press
 - Note:
 - Pages:
 - 455–470
 - Language:
 - URL:
 - https://aclanthology.org/Q17-1032
 - DOI:
 - 10.1162/tacl_a_00073
 - Cite (ACL):
 - Julian Brooke, Jan Šnajder, and Timothy Baldwin. 2017. Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice. Transactions of the Association for Computational Linguistics, 5:455–470.
 - Cite (Informal):
 - Unsupervised Acquisition of Comprehensive Multiword Lexicons using Competition in an n-gram Lattice (Brooke et al., TACL 2017)
 - PDF:
 - https://preview.aclanthology.org/ingest-acl-2023-videos/Q17-1032.pdf