Leixin Zhang


2024

pdf
Unveiling Semantic Information in Sentence Embeddings
Leixin Zhang | David Burian | Vojtěch John | Ondřej Bojar
Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024

This study evaluates the extent to which semantic information is preserved within sentence embeddings generated from state-of-art sentence embedding models: SBERT and LaBSE. Specifically, we analyzed 13 semantic attributes in sentence embeddings. Our findings indicate that some semantic features (such as tense-related classes) can be decoded from the representation of sentence embeddings. Additionally, we discover the limitation of the current sentence embedding models: inferring meaning beyond the lexical level has proven to be difficult.

pdf
Similarity-Based Cluster Merging for Semantic Change Modeling
Christopher Brückner | Leixin Zhang | Pavel Pecina
Proceedings of the 5th Workshop on Computational Approaches to Historical Language Change

pdf
Human and Machine: Language Processing in Translation Tasks
Hening Wang | Leixin Zhang | Ondrej Bojar
Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP 2024)

pdf
Twente-BMS-NLP at PerspectiveArg 2024: Combining Bi-Encoder and Cross-Encoder for Argument Retrieval
Leixin Zhang | Daniel Braun
Proceedings of the 11th Workshop on Argument Mining (ArgMining 2024)

The paper describes our system for the Perspective Argument Retrieval Shared Task. The shared task consists of three scenarios in which relevant political arguments have to be retrieved based on queries (Scenario 1). In Scenario 2 explicit socio-cultural properties are provided and in Scenario 3 implicit socio-cultural properties within the arguments have to be used. We combined a Bi-Encoder and a Cross-Encoder to retrieve relevant arguments for each query. For the third scenario, we extracted linguistic features to predict socio-demographic labels as a separate task. However, the socio-demographic match task proved challenging due to the constraints of argument lengths and genres. The described system won both tracks of the shared task.

pdf
Tübingen-CL at SemEval-2024 Task 1: Ensemble Learning for Semantic Relatedness Estimation
Leixin Zhang | Çağrı Çöltekin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

The paper introduces our system for SemEval-2024 Task 1, which aims to predict the relatedness of sentence pairs. Operating under the hypothesis that semantic relatedness is a broader concept that extends beyond mere similarity of sentences, our approach seeks to identify useful features for relatedness estimation. We employ an ensemble approach integrating various systems, including statistical textual features and outputs of deep learning models to predict relatedness scores. The findings suggest that semantic relatedness can be inferred from various sources and ensemble models outperform many individual systems in estimating semantic relatedness.