Optimal Partial Transport Based Sentence Selection for Long-form Document Matching
Weijie Yu, Liang Pang, Jun Xu, Bing Su, Zhenhua Dong, Ji-Rong Wen
Abstract
One typical approach to long-form document matching is first conducting alignment between cross-document sentence pairs, and then aggregating all of the sentence-level matching signals. However, this approach could be problematic because the alignment between documents is partial — despite two documents as a whole are well-matched, most of the sentences could still be dissimilar. Those dissimilar sentences lead to spurious sentence-level matching signals which may overwhelm the real ones, increasing the difficulties of learning the matching function. Therefore, accurately selecting the key sentences for document matching is becoming a challenging issue. To address the issue, we propose a novel matching approach that equips existing document matching models with an Optimal Partial Transport (OPT) based component, namely OPT-Match, which selects the sentences that play a major role in matching. Enjoying the partial transport properties of OPT, the selected key sentences can not only effectively enhance the matching accuracy, but also be explained as the rationales for the matching results. Extensive experiments on four publicly available datasets demonstrated that existing methods equipped with OPT-Match consistently outperformed the corresponding underlying methods. Evaluations also showed that the key sentences selected by OPT-Match were consistent with human-provided rationales.- Anthology ID:
- 2022.coling-1.208
- Volume:
- Proceedings of the 29th International Conference on Computational Linguistics
- Month:
- October
- Year:
- 2022
- Address:
- Gyeongju, Republic of Korea
- Editors:
- Nicoletta Calzolari, Chu-Ren Huang, Hansaem Kim, James Pustejovsky, Leo Wanner, Key-Sun Choi, Pum-Mo Ryu, Hsin-Hsi Chen, Lucia Donatelli, Heng Ji, Sadao Kurohashi, Patrizia Paggio, Nianwen Xue, Seokhwan Kim, Younggyun Hahm, Zhong He, Tony Kyungil Lee, Enrico Santus, Francis Bond, Seung-Hoon Na
- Venue:
- COLING
- SIG:
- Publisher:
- International Committee on Computational Linguistics
- Note:
- Pages:
- 2363–2373
- Language:
- URL:
- https://aclanthology.org/2022.coling-1.208
- DOI:
- Cite (ACL):
- Weijie Yu, Liang Pang, Jun Xu, Bing Su, Zhenhua Dong, and Ji-Rong Wen. 2022. Optimal Partial Transport Based Sentence Selection for Long-form Document Matching. In Proceedings of the 29th International Conference on Computational Linguistics, pages 2363–2373, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.
- Cite (Informal):
- Optimal Partial Transport Based Sentence Selection for Long-form Document Matching (Yu et al., COLING 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.coling-1.208.pdf
- Code
- ruc-wjyu/opt-match
- Data
- S2ORC