Alexander Ziem


2026

Current methods for automatically assigning frames to their evoking words can be divided into frame identification and frame induction. In frame identification, frame names coming from a labeled dataset are assigned to unseen instances, a classical supervised labeling task. However, the training datasets are known to be incomplete in terms of real-world frames, resulting in an issue with potentially new frame labels. In frame induction, instances are clustered regarding the frames they evoke, a classical unsupervised clustering task. However, existing training data is not used to identify known frames. To overcome these shortcomings, we propose to use semi-supervised clustering for combined frame identification and frame induction. By using constrained clustering with hard constraints coming from labeled data, the resulting clusters contain only labeled instances with the same label. Thus, frame names can be easily assigned. We show for English and German datasets that using semi-supervised clustering improves the quality of frame induction compared to unsupervised clustering methods and results in notably good performance regarding frame identification.

2024

This paper presents MoCCA, a Model of Comparative Concepts for Aligning Constructicons under development by a consortium of research groups building Constructicons of different languages including Brazilian Portuguese, English, German and Swedish. The Constructicons will be aligned by using comparative concepts (CCs) providing language-neutral definitions of linguistic properties. The CCs are drawn from typological research on grammatical categories and constructions, and from FrameNet frames, organized in a conceptual network. Language-specific constructions are linked to the CCs in accordance with general principles. MoCCA is organized into files of two types: a largely static CC Database file and multiple Linking files containing relations between constructions in a Constructicon and the CCs. Tools are planned to facilitate visualization of the CC network and linking of constructions to the CCs. All files and guidelines will be versioned, and a mechanism is set up to report cases where a language-specific construction cannot be easily linked to existing CCs.

2020

Framenets as an incarnation of frame semantics have been set up to deal with lexicographic issues (cf. Fillmore and Baker 2010, among others). They are thus concerned with lexical units (LUs) and the conceptual structure which categorizes these together. These lexically-evoked frames, however, do not reflect pragmatic properties of constructions (LUs and other types of constructions), such as expressing illocutions or being considered polite or very informal. From the viewpoint of a multilingual annotation effort, the Global FrameNet Shared Annotation Task, we discuss two phenomena, greetings and tag questions, which highlight the necessity both to investigate the role between construction and frame annotation on the one hand and to develop pragmatic frames describing social interactions which are not explicitly lexicalized.