A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing
Sebastian G.M. Händschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, Udo Hahn
Abstract
We introduce JOCo, a novel text corpus for NLP analytics in the field of economics, business and management. This corpus is composed of corporate annual and social responsibility reports of the top 30 US, UK and German companies in the major (DJIA, FTSE 100, DAX), middle-sized (S&P 500, FTSE 250, MDAX) and technology (NASDAQ, FTSE AIM 100, TECDAX) stock indices, respectively. Altogether, this adds up to 5,000 reports from 270 companies headquartered in three of the world’s most important economies. The corpus spans a time frame from 2000 up to 2015 and contains, in total, 282M tokens. We also feature JOCo in a small-scale experiment to demonstrate its potential for NLP-fueled studies in economics, business and management research.- Anthology ID:
- W18-3103
- Volume:
- Proceedings of the First Workshop on Economics and Natural Language Processing
- Month:
- July
- Year:
- 2018
- Address:
- Melbourne, Australia
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 20–31
- Language:
- URL:
- https://aclanthology.org/W18-3103
- DOI:
- 10.18653/v1/W18-3103
- Cite (ACL):
- Sebastian G.M. Händschke, Sven Buechel, Jan Goldenstein, Philipp Poschmann, Tinghui Duan, Peter Walgenbach, and Udo Hahn. 2018. A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing. In Proceedings of the First Workshop on Economics and Natural Language Processing, pages 20–31, Melbourne, Australia. Association for Computational Linguistics.
- Cite (Informal):
- A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing (Händschke et al., ACL 2018)
- PDF:
- https://preview.aclanthology.org/ingestion-script-update/W18-3103.pdf