FAIR: Filtering of Automatically Induced Rules
Divya Jyoti Bajpai, Ayush Maheshwari, Manjesh Hanawal, Ganesh Ramakrishnan
Abstract
The availability of large annotated data can be a critical bottleneck in training machine learning algorithms successfully, especially when applied to diverse domains. Weak supervision offers a promising alternative by accelerating the creation of labeled training data using domainspecific rules. However, it requires users to write a diverse set of high-quality rules to assign labels to the unlabeled data. Automatic Rule Induction (ARI) approaches circumvent this problem by automatically creating rules from features on a small labeled set and filtering a final set of rules from them. In the ARI approach, the crucial step is to filter out a set of a high-quality useful subset of rules from the large set of automatically created rules. In this paper, we propose an algorithm FAIR (Filtering of Automatically Induced Rules) to filter rules from a large number of automatically induced rules using submodular objective functions that account for the collective precision, coverage, and conflicts of the rule set. We experiment with three ARI approaches and five text classification datasets to validate the superior performance of our algorithm with respect to several semi-supervised label aggregation approaches. Further, we show that FAIR achieves statistically significant results in comparison to existing rule-filtering approaches. The source code is available at https://github.com/ ayushbits/FAIR-LF-Induction.- Anthology ID:
- 2024.eacl-long.34
- Volume:
- Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Yvette Graham, Matthew Purver
- Venue:
- EACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 573–588
- Language:
- URL:
- https://aclanthology.org/2024.eacl-long.34
- DOI:
- Cite (ACL):
- Divya Jyoti Bajpai, Ayush Maheshwari, Manjesh Hanawal, and Ganesh Ramakrishnan. 2024. FAIR: Filtering of Automatically Induced Rules. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 573–588, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- FAIR: Filtering of Automatically Induced Rules (Bajpai et al., EACL 2024)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2024.eacl-long.34.pdf