2025
pdf
bib
abs
The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear Subspaces
Ahmed Oumar El-Shangiti
|
Tatsuya Hiraoka
|
Hilal AlQuabeh
|
Benjamin Heinzerling
|
Kentaro Inui
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)
This paper investigates whether large language models (LLMs) utilize numerical attributes encoded in a low-dimensional subspace of theembedding space when answering questions involving numeric comparisons, e.g., Was Cristiano born before Messi? We first identified,using partial least squares regression, these subspaces, which effectively encode the numerical attributes associated with the entities in comparison prompts. Further, we demonstrate causality, by intervening in these subspaces to manipulate hidden states, thereby altering the LLM’s comparison outcomes. Experiments conducted on three different LLMs showed that our results hold across different numerical attributes, indicating that LLMs utilize the linearly encoded information for numerical reasoning.
2024
pdf
bib
abs
SAFARI: Cross-lingual Bias and Factuality Detection in News Media and News Articles
Dilshod Azizov
|
Zain Muhammad Mujahid
|
Hilal AlQuabeh
|
Preslav Nakov
|
Shangsong Liang
Findings of the Association for Computational Linguistics: EMNLP 2024
In an era where information is quickly shared across many cultural and language contexts, the neutrality and integrity of news media are essential. Ensuring that media content remains unbiased and factual is crucial for maintaining public trust. With this in mind, we introduce SAFARI (CroSs-lingual BiAs and Factuality Detection in News MediA and News ARtIcles), a novel corpus of news media and articles for predicting political bias and the factuality of reporting in a multilingual and cross-lingual setup. To the best of our knowledge, this corpus is unprecedented in its collection and introduces a dataset for political bias and factuality for three tasks: (i) media-level, (ii) article-level, and (iii) joint modeling at the article-level. At the media and article levels, we evaluate the cross-lingual ability of the models; however, in joint modeling, we evaluate on English data. Our frameworks set a new benchmark in the cross-lingual evaluation of political bias and factuality. This is achieved through the use of various Multilingual Pre-trained Language Models (MPLMs) and Large Language Models (LLMs) coupled with ensemble learning methods.
2023
pdf
bib
abs
Lotus at WojoodNER Shared Task: Multilingual Transformers: Unveiling Flat and Nested Entity Recognition
Jiyong Li
|
Dilshod Azizov
|
Hilal AlQuabeh
|
Shangsong Liang
Proceedings of ArabicNLP 2023
We introduce our systems developed for two subtasks in the shared task “Wojood” on Arabic NER detection, part of ArabicNLP 2023. For Subtask 1, we employ the XLM-R model to predict Flat NER labels for given tokens using a single classifier capable of categorizing all labels. For Subtask 2, we use the XLM-R encoder by building 21 individual classifiers. Each classifier corresponds to a specific label and is designed to determine the presence of its respective label. In terms of performance, our systems achieved competitive micro-F1 scores of 0.83 for Subtask 1 and 0.76 for Subtask 2, according to the leaderboard scores.