“I Need More Context and an English Translation”: Analysing How LLMs Identify Personal Information in Komi, Polish, and English

Nikolai Ilinykh; Maria Irena Szawerna

“I Need More Context and an English Translation”: Analysing How LLMs Identify Personal Information in Komi, Polish, and English

Abstract

Automatic identification of personal information (PI) is particularly difficult for languages with limited linguistic resources. Recently, large language models (LLMs) have been applied to various tasks involving low-resourced languages, but their capability to process PI in such contexts remains under-explored. In this paper we provide a qualitative analysis of the outputs from three LLMs prompted to identify PI in texts written in Komi (Permyak and Zyrian), Polish, and English. Our analysis highlights challenges in using pre-trained LLMs for PI identification in both low- and medium-resourced languages. It also motivates the need to develop LLMs that understand the differences in how PI is expressed across languages with varying levels of availability of linguistic resources.

Anthology ID:: 2025.resourceful-1.32
Volume:: Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025)
Month:: March
Year:: 2025
Address:: Tallinn, Estonia
Editors:: Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, Crina Madalina Tudor
Venues:: RESOURCEFUL | WS
SIG:
Publisher:: University of Tartu Library, Estonia
Note:
Pages:: 165–178
Language:
URL:: https://preview.aclanthology.org/ingest-emnlp/2025.resourceful-1.32/
DOI:
Bibkey:
Cite (ACL):: Nikolai Ilinykh and Maria Irena Szawerna. 2025. “I Need More Context and an English Translation”: Analysing How LLMs Identify Personal Information in Komi, Polish, and English. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 165–178, Tallinn, Estonia. University of Tartu Library, Estonia.
Cite (Informal):: “I Need More Context and an English Translation”: Analysing How LLMs Identify Personal Information in Komi, Polish, and English (Ilinykh & Szawerna, RESOURCEFUL 2025)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-emnlp/2025.resourceful-1.32.pdf

PDF Cite Search Fix data