# Supplementary material for *Stop Word Lists in Free Open-source Software Packages*

NLP-OSS Workshop 2018

Attached are the primary data from https://github.com/igorbrigadir/stopwords, as well as assorted scripts used to perform analysis presented in the paper.

Datasets:

stopwords/ contains our data taken from https://github.com/igorbrigadir/stopwords. (See Section 4)

Scripts:

preprocessing/ contains scripts for data preprocessing

cluster-stop-lists.py contains the script to generate hierarchically-clustered heatmap for stop word lists. (See Section 5 & Figure 5)

*.ipynb contain the analysis in section 6

upset-with-words.py contains the script to generate upset plot for certain words. (See Section 6.2 & Figure 4)

incompleteness-analysis.py contains the script to explore the incompleteness for stop word lists. (See Section 6.3)
