2023
pdf
abs
Exploring Techniques to Detect and Mitigate Non-Inclusive Language Bias in Marketing Communications Using a Dictionary-Based Approach
Bharathi Raja Chakravarthi
|
Prasanna Kumar Kumaresan
|
Rahul Ponnusamy
|
John P. McCrae
|
Michaela Comerford
|
Jay Megaro
|
Deniz Keles
|
Last Feremenga
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
We propose a new dataset for detecting non-inclusive language in sentences in English. These sentences were gathered from public sites, explaining what is inclusive and what is non-inclusive. We also extracted potentially non-inclusive keywords/phrases from the guidelines from business websites. A phrase dictionary was created by using an automatic extension with a word embedding trained on a massive corpus of general English text. In the end, a phrase dictionary was constructed by hand-editing the previous one to exclude inappropriate expansions and add the keywords from the guidelines. In a business context, the words individuals use can significantly impact the culture of inclusion and the quality of interactions with clients and prospects. Knowing the right words to avoid helps customers of different backgrounds and historically excluded groups feel included. They can make it easier to have productive, engaging, and positive communications. You can find the dictionaries, the code, and the method for making requests for the corpus at (we will release the link for data and code once the paper is accepted).
2022
pdf
abs
Towards Classification of Legal Pharmaceutical Text using GAN-BERT
Tapan Auti
|
Rajdeep Sarkar
|
Bernardo Stearns
|
Atul Kr. Ojha
|
Arindam Paul
|
Michaela Comerford
|
Jay Megaro
|
John Mariano
|
Vall Herard
|
John P. McCrae
Proceedings of the First Computing Social Responsibility Workshop within the 13th Language Resources and Evaluation Conference
Pharmaceutical text classification is an important area of research for commercial and research institutions working in the pharmaceutical domain. Addressing this task is challenging due to the need of expert verified labelled data which can be expensive and time consuming to obtain. Towards this end, we leverage predictive coding methods for the task as they have been shown to generalise well for sentence classification. Specifically, we utilise GAN-BERT architecture to classify pharmaceutical texts. To capture the domain specificity, we propose to utilise the BioBERT model as our BERT model in the GAN-BERT framework. We conduct extensive evaluation to show the efficacy of our approach over baselines on multiple metrics.
2021
pdf
abs
Few-shot and Zero-shot Approaches to Legal Text Classification: A Case Study in the Financial Sector
Rajdeep Sarkar
|
Atul Kr. Ojha
|
Jay Megaro
|
John Mariano
|
Vall Herard
|
John P. McCrae
Proceedings of the Natural Legal Language Processing Workshop 2021
The application of predictive coding techniques to legal texts has the potential to greatly reduce the cost of legal review of documents, however, there is such a wide array of legal tasks and continuously evolving legislation that it is hard to construct sufficient training data to cover all cases. In this paper, we investigate few-shot and zero-shot approaches that require substantially less training data and introduce a triplet architecture, which for promissory statements produces performance close to that of a supervised system. This method allows predictive coding methods to be rapidly developed for new regulations and markets.