# ACL Citations

This is the dataset for the following paper:

+ Marcel Bollmann and Desmond Elliott (2020). **On Forgetting to Cite Older
  Papers: An Analysis of the ACL Anthology.** In *Proceedings of ACL2020.*

The accompanying repository is <https://github.com/coastalcph/acl-citations/>.

## Dataset

We include two data files that we base our analysis on:

+ `acl-parscit.tsv` contains the years of papers cited in the References section
  of [ACL Anthology](https://www.aclweb.org/anthology) papers.  It is a TSV file
  with the following columns:

  1. The ACL Anthology ID of the paper we extracted references from.
  2. The year of publication of that paper.
  3. A comma-separated list of years of publications for papers in the
     References section.

+ `citations-all.matched.tsv` is the result of extracting author/title
  information for each cited paper and running those through our fuzzy-matching
  algorithm.  It is a TSV file with one reference entry per line and the
  following columns:

  1. An ID for the extracted reference.
  2. The number of times this reference was cited in our dataset.
  3. The year of publication of the extracted reference.
  4. Its author list.
  5. Its title.
  6. A comma-separated list of ACL Anthology IDs of papers that were identified
     as citing this reference.
