Abstract
The Web archived data usually contains high-quality documents that are very useful for creating specialized collections of documents. To create such collections, there is a substantial need for automatic approaches that can distinguish the documents of interest for a collection out of the large collections (of millions in size) from Web Archiving institutions. However, the patterns of the documents of interest can differ substantially from one document to another, which makes the automatic classification task very challenging. In this paper, we explore dynamic fusion models to find, on the fly, the model or combination of models that performs best on a variety of document types. Our experimental results show that the approach that fuses different models outperforms individual models and other ensemble methods on three datasets.- Anthology ID:
- 2020.lrec-1.182
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 1459–1468
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.182
- DOI:
- Cite (ACL):
- Krutarth Patel, Cornelia Caragea, and Mark Phillips. 2020. Dynamic Classification in Web Archiving Collections. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1459–1468, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Dynamic Classification in Web Archiving Collections (Patel et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2020.lrec-1.182.pdf