This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
JacoboMyerston
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Sentiment analysis of historical literature provides valuable insights for humanities research, yet remains challenging due to scarce annotations and limited generalization of models trained on modern texts. Prior work has primarily focused on two directions: using sentiment lexicons or leveraging large language models (LLMs) for annotation. However, lexicons are often unavailable for historical texts due to limited linguistic resources, and LLM-generated labels often reflect modern sentiment norms and fail to capture the implicit, ironic, or morally nuanced expressions typical of historical literature, resulting in noisy supervision. To address these issues, we introduce a role-guided annotation strategy that prompts LLMs to simulate historically situated perspectives when labeling sentiment. Furthermore, we design a prototype-aligned framework that learns sentiment prototypes from high-resource data and aligns them with low-resource representations via symmetric contrastive loss, improving robustness to noisy labels. Experiments across multiple historical literature datasets show that our method outperforms state-of-the-art baselines, demonstrating its effectiveness.
Standard natural language processing (NLP) pipelines operate on symbolic representations of language, which typically consist of sequences of discrete tokens. However, creating an analogous representation for ancient logographic writing systems is an extremely labor-intensive process that requires expert knowledge. At present, a large portion of logographic data persists in a purely visual form due to the absence of transcription—this issue poses a bottleneck for researchers seeking to apply NLP toolkits to study ancient logographic languages: most of the relevant data are images of writing. This paper investigates whether direct processing of visual representations of language offers a potential solution. We introduce LogogramNLP, the first benchmark enabling NLP analysis of ancient logographic languages, featuring both transcribed and visual datasetsfor four writing systems along with annotations for tasks like classification, translation, and parsing. Our experiments compare systems thatemploy recent visual and text encoding strategies as backbones. The results demonstrate that visual representations outperform textual representations for some investigated tasks, suggesting that visual processing pipelines may unlock a large amount of cultural heritage data of logographic languages for NLP-based analyses. Data and code are available at https: //logogramNLP.github.io/.
Cuneiform is the oldest writing system used for more than 3,000 years in ancient Mesopotamia. Cuneiform is written on clay tablets, which are hard to date because they often lack explicit references to time periods and their paleographic traits are not always reliable as a dating criterion. In this paper, we systematically analyse cuneiform dating problems using machine learning. We build baseline models for both visual and textual features and identify two major issues: confounds and distribution shift. We apply adversarial regularization and deep domain adaptation to mitigate these issues. On tablets from the same museum collections represented in the training set, we achieve accuracies as high as 84.42%. However, when test tablets are taken from held-out collections, models generalize more poorly. This is only partially mitigated by robust learning techniques, highlighting important challenges for future work.