Abstract
In this paper we report on constructing a finite state automaton comprising automatically extracted terminology and significant collocation patterns from a training corpus of specialist news (Reuters Financial News). The automaton can be used to unambiguously identify sentiment-bearing words that might be able to make or break people, companies, perhaps even governments. The paper presents the emerging face of corpus linguistics where a corpus is used to bootstrap both the terminology and the significant meaning bearing patterns from the corpus. Much of the current content analysis software systems require a human coder to eyeball terms and sentiment words. Such an approach might yield very good quality results on small text collections but when confronted with a 40-50 million word corpus such an approach does not scale, and a large-scale computer-based approach is required. We report on the use of Grid computing technologies and techniques to cope with this analysis.- Anthology ID:
- L06-1231
- Volume:
- Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
- Month:
- May
- Year:
- 2006
- Address:
- Genoa, Italy
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, Bente Maegaard, Joseph Mariani, Jan Odijk, Daniel Tapias
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/394_pdf.pdf
- DOI:
- Cite (ACL):
- Khurshid Ahmad, Lee Gillam, and David Cheng. 2006. Sentiments on a Grid: Analysis of Streaming News and Views. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy. European Language Resources Association (ELRA).
- Cite (Informal):
- Sentiments on a Grid: Analysis of Streaming News and Views (Ahmad et al., LREC 2006)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2006/pdf/394_pdf.pdf