This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
HanRen
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
Pre-trained langauge models have achieved success in many natural language processing tasks, whereas they are trapped by the time-agnostic setting, impacting the performance in automatic text dating. This paper introduces TicTac, a supervised fine-tuning model for automatic text dating. Unlike the existing models that always ignore the temporal relatedness of documents, TicTac has the ability to learn temporal semantic information, which is helpful for capturing the temporal implications over long-time span corpora. As a fine-tuning framework, TicTac employs a contrastive learning-based approach to model two types of temporal relations of diachronic documents. TicTac also adopts a metric learning approach, where the temporal distance between a historical text and its category label is estimated, which benefits to learn temporal semantic information on texts with temporal ordering. Experiments on two diachronic corpora show that our model effectively captures the temporal semantic information and outperforms state-of-the-art baselines.
This paper presents Temporal-aware Soft Prompt Tuning (TASPT), a novel approach for automatic text dating. Unlike existing methods, which often overlook the evolution of word meanings in texts spanning long periods, TASPT incorporates the unique characteristics of historical texts. It introduces a temporal-aware text representation that dynamically captures both semantic variance and invariance. This representation is combined with a soft prompt, enabling efficient parameter tuning for automatic text dating. Experiments show that TASPT outperforms all existing methods on two diachronic datasets: the Twenty-Four Histories and the Royal Society Corpus.
Automatic text dating(ATD) is a challenging task since explicit temporal mentions usually do not appear in texts. Existing state-of-the-art approaches learn word representations via language models, whereas most of them ignore diachronic change of words, which may affect the efforts of text modeling. Meanwhile, few of them consider text modeling for long diachronic documents. In this paper, we present a time-aware language model named TALM, to learn temporal word representations by transferring language models of general domains to those of time-specific ones. We also build a hierarchical modeling approach to represent diachronic documents by encoding them with temporal word representations. Experiments on a Chinese diachronic corpus show that our model effectively captures implicit temporal information of words, and outperforms state-of-the-art approaches in historical text dating as well.
Catchwords refer to popular words or phrases within certain area in certain period of time. In this paper, we propose a novel approach for automatic Chinese catchwords extraction. At the beginning, we discuss the linguistic definition of catchwords and analyze the features of catchwords by manual evaluation. According to those features of catchwords, we define three aspects to describe Popular Degree of catchwords. To extract terms with maximum meaning, we adopt an effective ATE algorithm for multi-character words and long phrases. Then we use conic fitting in Time Series Analysis to build Popular Degree Curves of extracted terms. To calculate Popular Degree Values of catchwords, a formula is proposed which includes values of Popular Trend, Peak Value and Popular Keeping. Finally, a ranking list of catchword candidates is built according to Popular Degree Values. Experiments show that automatic Chinese catchword extraction is effective and objective in comparison with manual evaluation.