@inproceedings{fairon-etal-2008-glossanet,
title = "{G}lossa{N}et 2: a linguistic search engine for {RSS}-based corpora",
author = "Fairon, C{\'e}drick and
Mac{\'e}, K{\'e}vin and
Naets, Hubert",
editor = "Evert, Stefan and
Kilgarriff, Adam and
Sharoff, Serge",
booktitle = "Proceedings of the 4th Web as Corpus Workshop",
month = jun,
year = "2008",
address = "Marrakech, Morocco",
publisher = "European Language Resources Association",
url = "https://preview.aclanthology.org/jlcl-multiple-ingestion/2008.wac-1.6/",
pages = "34--39",
abstract = "This paper presents GlossaNet 2, a free online concordance service that enables users to search into dynamic Web corpora. Two steps are involved in using GlossaNet. At first, users define a corpus by selecting RSS feeds in a preselected pool of sources (they can also add their own RSS feeds). These sources will be visited on a regular basis by a crawler in order to generate a dynamic corpus. Secondly, the user can register one or more search queries on his / her dynamic corpus. Search queries will be re-applied on the corpus every time it is updated, new concordances will be recorded for the user (results can be emailed, published for the user in a privative RSS feed, or they can be viewed online). This service integrates two preexisting software: Corporator (Fairon, 2006), a program that creates corpora by downloading, filtering RSS feeds, Unitex (Paumier, 2003), an open source corpus processor that relies on linguistic resources. After a short introduction, we will briefly present the concept of {\textquotedblleft}RSS corpora{\textquotedblright}, the assets of this approach to corpus development. We will then give an overview of the GlossaNet architecture, present various cases of use."
}
Markdown (Informal)
[GlossaNet 2: a linguistic search engine for RSS-based corpora](https://preview.aclanthology.org/jlcl-multiple-ingestion/2008.wac-1.6/) (Fairon et al., WAC 2008)
ACL