Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations

Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas, Eva Katakalou, Prodromos Malakasiotis


Abstract
Major scandals in corporate history have urged the need for regulatory compliance, where organizations need to ensure that their controls (processes) comply with relevant laws, regulations, and policies. However, keeping track of the constantly changing legislation is difficult, thus organizations are increasingly adopting Regulatory Technology (RegTech) to facilitate the process. To this end, we introduce regulatory information retrieval (REG-IR), an application of document-to-document information retrieval (DOC2DOC IR), where the query is an entire document making the task more challenging than traditional IR where the queries are short. Furthermore, we compile and release two datasets based on the relationships between EU directives and UK legislation. We experiment on these datasets using a typical two-step pipeline approach comprising a pre-fetcher and a neural re-ranker. Experimenting with various pre-fetchers from BM25 to k nearest neighbors over representations from several BERT models, we show that fine-tuning a BERT model on an in-domain classification task produces the best representations for IR. We also show that neural re-rankers under-perform due to contradicting supervision, i.e., similar query-document pairs with opposite labels. Thus, they are biased towards the pre-fetcher’s score. Interestingly, applying a date filter further improves the performance, showcasing the importance of the time dimension.
Anthology ID:
2021.eacl-main.305
Volume:
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Month:
April
Year:
2021
Address:
Online
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
3498–3511
Language:
URL:
https://aclanthology.org/2021.eacl-main.305
DOI:
10.18653/v1/2021.eacl-main.305
Bibkey:
Cite (ACL):
Ilias Chalkidis, Manos Fergadiotis, Nikolaos Manginas, Eva Katakalou, and Prodromos Malakasiotis. 2021. Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3498–3511, Online. Association for Computational Linguistics.
Cite (Informal):
Regulatory Compliance through Doc2Doc Information Retrieval: A case study in EU/UK legislation where text similarity has limitations (Chalkidis et al., EACL 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.eacl-main.305.pdf