Jakub Szymanik


HerBERT Based Language Model Detects Quantifiers and Their Semantic Properties in Polish
Marcin Woliński | Bartłomiej Nitoń | Witold Kieraś | Jakub Szymanik
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The paper presents a tool for automatic marking up of quantifying expressions, their semantic features, and scopes. We explore the idea of using a BERT based neural model for the task (in this case HerBERT, a model trained specifically for Polish, is used). The tool is trained on a recent manually annotated Corpus of Polish Quantificational Expressions (Szymanik and Kieraś, 2022). We discuss how it performs against human annotation and present results of automatic annotation of 300 million sub-corpus of National Corpus of Polish. Our results show that language models can effectively recognise semantic category of quantification as well as identify key semantic properties of quantifiers, like monotonicity. Furthermore, the algorithm we have developed can be used for building semantically annotated quantifier corpora for other languages.


Language Models Use Monotonicity to Assess NPI Licensing
Jaap Jumelet | Milica Denic | Jakub Szymanik | Dieuwke Hupkes | Shane Steinert-Threlkeld
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

‘Most’ vs ‘More Than Half’: An Alternatives Explanation
Fausto Carcassi | Jakub Szymanik
Proceedings of the Society for Computation in Linguistics 2021


Some of Them Can be Guessed! Exploring the Effect of Linguistic Context in Predicting Quantifiers
Sandro Pezzelle | Shane Steinert-Threlkeld | Raffaella Bernardi | Jakub Szymanik
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

We study the role of linguistic context in predicting quantifiers (‘few’, ‘all’). We collect crowdsourced data from human participants and test various models in a local (single-sentence) and a global context (multi-sentence) condition. Models significantly out-perform humans in the former setting and are only slightly better in the latter. While human performance improves with more linguistic context (especially on proportional quantifiers), model performance suffers. Models are very effective in exploiting lexical and morpho-syntactic patterns; humans are better at genuinely understanding the meaning of the (global) context.


Semantic Complexity of Quantifiers and Their Distribution in Corpora
Jakub Szymanik | Camilo Thorne
Proceedings of the 11th International Conference on Computational Semantics


Pragmatic identification of the witness sets
Livio Robaldo | Jakub Szymanik
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Among the readings available for NL sentences, those where two or more sets of entities are independent of one another are particularly challenging from both a theoretical and an empirical point of view. Those readings are termed here as ‘Independent Set (IS) readings'. Standard examples of such readings are the well-known Collective and Cumulative Readings. (Robaldo, 2011) proposes a logical framework that can properly represent the meaning of IS readings in terms of a set-Skolemization of the witness sets. One of the main assumptions of Robaldo's logical framework, drawn from (Schwarzschild, 1996), is that pragmatics plays a crucial role in the identification of such witness sets. Those are firstly identified on pragmatic grounds, then logical clauses are asserted on them in order to trigger the appropriate inferences. In this paper, we present the results of an experimental analysis that appears to confirm Robaldo's hypotheses concerning the pragmatic identification of the witness sets.


Interpreting tractable versus intractable reciprocal sentences
Oliver Bott | Fabian Schlotterbeck | Jakub Szymanik
Proceedings of the Ninth International Conference on Computational Semantics (IWCS 2011)