Our paper was accepted to EMNLP Findings (not conditionally accepted), but we would like to include this note to the AC regarding revisions we made to our camera-ready version.

We have made our training and evaluation code publicly available at the repository https://github.com/g-luo/geolocation_via_guidebook_grounding, which we link in the main paper. In this repository, we have made a few simplifications to our training scheme since the submission of our paper: we removed duplicates in the guidebook and used constant weighting for our pseudo label losses (rather than computing them on a per-image basis). We have rerun experiments with this simplified public repository and updated the experimental results reported in Tables 2, 3 (test), 4, 5 (val) and relevant prose accordingly. The impact of these changes is very small. After these minor changes our key findings remain the same, with ISN achieving 65% and G^3 achieving 70% Top-1 classification accuracy, with G^3 still outperforming ISN by more than 5%.

We have also revised the main paper according to the suggestion of the reviewers, including expanding on the use of the weak attention supervision in Section 4 and adding a qualitative example where ISN, ISN + CLIP succeed but G^3 fails in Figure 9 in the Appendix.