The Anthology’s data is hosted in our Github repo, which contains all the metadata for all its papers (under /data/xml) and their authors (data/yaml/people.yaml), the volumes those papers are organized into, and the real-world events that presented those volumes. The PDFs are hosted on our servers.
All of this data is accessible via the ACL Anthology Python module on PyPI, which you can install with pip:
pip install acl-anthology
Please see our repository documentation or the Python module documentation for more information.
You may also be interested in learning how easy it is to cite our papers in a variety of citation formats.
Citing papers in the ACL Anthology is simple. We provide
- bulk bibliographic exports
- click-to-copy citation keys on each paper page
- per-paper citation downloads
Our primary supported format is BibTeX. The simplest way to cite papers is download the bulk BibTeX exports, and then use the citation keys, which are often inferrable from the paper’s author list and title: {authors}-{year}-{title-word}, where
{authors} is the last names of the first one or two authors, separated by hyphens; if more than two authors, etal is used for the second author{year} is the four-digit year of publication{title-word} is the first significant word of the paper’s title; additional words are added if needed to create a unique bibkey
Some examples are galley-etal-2004-whats and huang-chiang-2005-better. For convenience, a button on each paper page provides click-to-copy access to the bibkey.
Bulk downloads for consolidated BibTeX files are available in the following variations.
- Overleaf-friendly: anthology-1.bib, anthology-2.bib etc. are sharded variants that are under 50 MB each, suitable for direct import into Overleaf repositories.
- Full, with abstracts: anthology+abstracts.bib.gz contains citations for all papers that exist in the Anthology, including abstracts.
- No abstracts: anthology.bib.gz contains all citations but removes abstracts, to save on space.
- No abstracts, uncompressed: anthology.bib is the same as the above, but provided uncompressed, for convenience.
For individual papers, buttons are provided to download citation data in a number of formats, including BibTeX, MODS XML, Endnote, and an informal citation string.
These formats can be downloaded as files or copied to the clipboard via convenient buttons.
Finally, we also offer an XML paper feed, which is useful in tools like Zotero and Mendeley.
Every paper in the Anthology is assigned an Anthology ID.
After 2020, identifiers are of the form {year}.{venue}-{volume}.{#}, where {year} is the four-digit year, {venue} a lowercased venue code comprising ASCII letters and digits, {volume} is a volume name or number, and {#} is the paper number.
Prior to 2020, this identifier took the form CYY-VPPP or CYY-VVPP, where C is a collection, YY a two-digit year, V a volume, and P a paper ID.
The canonical URL of an Anthology paper is given by appending this identifier to the Anthology’s base URL https://www.aclanthology.org/; e.g., https://www.aclanthology.org/2020.iwclul-1.4 (new style) or https://www.aclanthology.org/E91-1001 (old style).
This is the paper’s landing page, which includes (among other things) a link to the PDF.
Many papers in the Anthology also have Digital Object Identifiers (DOIs).
Both the DOIs and the canonical Anthology URLs embed the 8-character ACL Anthology Identifier.
When available, DOI URLs will redirect to the Anthology canonical URL, and will be listed on that page.
Variations of the canonical URL can be used to access the PDF and citation format files directly:
and so on.