Stacey Taylor


2020

pdf
DNLP@FinTOC’20: Table of Contents Detection in Financial Documents
Dijana Kosmajac | Stacey Taylor | Mozhgan Saeidi
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation

Title Detection and Table of Contents Generation are important components in detecting document structure. In particular, these two elements serve to provide the skeleton of the document, providing users with an understanding of organization, as well as the relevance of information, and where to find information within the document. Here, we show that using tesseract with Levenstein distance, a feature set inspired by Alk et al., we were able to correctly classify the title to an F1 measure 0.73 and 0.87, and the table-of-contents to a harmonic mean of 0.36 and 0.39, in English and French respectively. Our methodology works with both PDF and scanned documents, giving it a wide range of applicability within the document engineering and storage domains.

pdf
e-Commerce and Sentiment Analysis: Predicting Outcomes of Class Action Lawsuits
Stacey Taylor | Vlado Keselj
Proceedings of the 3rd Workshop on e-Commerce and NLP

In recent years, the focus of e-Commerce research has been on better understanding the relationship between the internet marketplace, customers, and goods and services. This has been done by examining information that can be gleaned from consumer information, recommender systems, click rates, or the way purchasers go about making buying decisions, for example. This paper takes a very different approach and examines the companies themselves. In the past ten years, e-Commerce giants such as Amazon, Skymall, Wayfair, and Groupon have been embroiled in class action security lawsuits promulgated under Rule 10b(5), which, in short, is one of the Securities and Exchange Commission’s main rules surrounding fraud. Lawsuits are extremely expensive to the company and can damage a company’s brand extensively, with the shareholders left to suffer the consequences. We examined the Management Discussion and Analysis and the Market Risks for 96 companies using sentiment analysis on selected financial measures and found that we were able to predict the outcome of the lawsuits in our dataset using sentiment (tone) alone to a recall of 0.8207 using the Random Forest classifier. We believe that this is an important contribution as it has cross-domain implications and potential, and opens up new areas of research in e-Commerce, finance, and law, as the settlements from the class action lawsuits in our dataset alone are in excess of $1.6 billion dollars, in aggregate.

pdf
Using Extractive Lexicon-based Sentiment Analysis to Enhance Understanding ofthe Impact of Non-GAAP Measures in Financial Reporting
Stacey Taylor | Vlado Keselj
Proceedings of the Second Workshop on Financial Technology and Natural Language Processing