This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
LeixinZhang
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Variation in human annotation and human perspectives has drawn increasing attention in natural language processing research. Disagreement observed in data annotation challenges the conventional assumption of a single “ground truth” and uniform models trained on aggregated annotations, which tend to overlook minority viewpoints and individual perspectives. This proposal investigates three directions of perspective-oriented research: First, annotation formats that better capture the granularity and uncertainty of individual judgments; Second, annotation modeling that leverages socio-demographic features to better represent and predict underrepresented or minority perspectives; Third, personalized text generation that tailors outputs to individual users’ preferences and communicative styles. The proposed tasks aim to advance natural language processing research towards more faithfully reflecting the diversity of human interpretation, enhancing both inclusiveness and fairness in language technologies.
The paper describes our system for the Perspective Argument Retrieval Shared Task. The shared task consists of three scenarios in which relevant political arguments have to be retrieved based on queries (Scenario 1). In Scenario 2 explicit socio-cultural properties are provided and in Scenario 3 implicit socio-cultural properties within the arguments have to be used. We combined a Bi-Encoder and a Cross-Encoder to retrieve relevant arguments for each query. For the third scenario, we extracted linguistic features to predict socio-demographic labels as a separate task. However, the socio-demographic match task proved challenging due to the constraints of argument lengths and genres. The described system won both tracks of the shared task.
This study evaluates the extent to which semantic information is preserved within sentence embeddings generated from state-of-art sentence embedding models: SBERT and LaBSE. Specifically, we analyzed 13 semantic attributes in sentence embeddings. Our findings indicate that some semantic features (such as tense-related classes) can be decoded from the representation of sentence embeddings. Additionally, we discover the limitation of the current sentence embedding models: inferring meaning beyond the lexical level has proven to be difficult.
The paper introduces our system for SemEval-2024 Task 1, which aims to predict the relatedness of sentence pairs. Operating under the hypothesis that semantic relatedness is a broader concept that extends beyond mere similarity of sentences, our approach seeks to identify useful features for relatedness estimation. We employ an ensemble approach integrating various systems, including statistical textual features and outputs of deep learning models to predict relatedness scores. The findings suggest that semantic relatedness can be inferred from various sources and ensemble models outperform many individual systems in estimating semantic relatedness.