2024
pdf
bib
abs
MoCCA: A Model of Comparative Concepts for Aligning Constructicons
Arthur Lorenzi
|
Peter Ljunglöf
|
Ben Lyngfelt
|
Tiago Timponi Torrent
|
William Croft
|
Alexander Ziem
|
Nina Böbel
|
Linnéa Bäckström
|
Peter Uhrig
|
Ely E. Matos
Proceedings of the 20th Joint ACL - ISO Workshop on Interoperable Semantic Annotation @ LREC-COLING 2024
This paper presents MoCCA, a Model of Comparative Concepts for Aligning Constructicons under development by a consortium of research groups building Constructicons of different languages including Brazilian Portuguese, English, German and Swedish. The Constructicons will be aligned by using comparative concepts (CCs) providing language-neutral definitions of linguistic properties. The CCs are drawn from typological research on grammatical categories and constructions, and from FrameNet frames, organized in a conceptual network. Language-specific constructions are linked to the CCs in accordance with general principles. MoCCA is organized into files of two types: a largely static CC Database file and multiple Linking files containing relations between constructions in a Constructicon and the CCs. Tools are planned to facilitate visualization of the CC network and linking of constructions to the CCs. All files and guidelines will be versioned, and a mechanism is set up to report cases where a language-specific construction cannot be easily linked to existing CCs.
pdf
bib
abs
Frame2: A FrameNet-based Multimodal Dataset for Tackling Text-image Interactions in Video
Frederico Belcavello
|
Tiago Timponi Torrent
|
Ely E. Matos
|
Adriana S. Pagano
|
Maucha Gamonal
|
Natalia Sigiliano
|
Lívia Vicente Dutra
|
Helen de Andrade Abreu
|
Mairon Samagaio
|
Mariane Carvalho
|
Franciany Campos
|
Gabrielly Azalim
|
Bruna Mazzei
|
Mateus Fonseca de Oliveira
|
Ana Carolina Loçasso Luz
|
Lívia Pádua Ruiz
|
Júlia Bellei
|
Amanda Pestana
|
Josiane Costa
|
Iasmin Rabelo
|
Anna Beatriz Silva
|
Raquel Roza
|
Mariana Souza
|
Igor Oliveira
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper presents the Frame2 dataset, a multimodal dataset built from a corpus of a Brazilian travel TV show annotated for FrameNet categories for both the text and image communicative modes. Frame2 comprises 230 minutes of video, which are correlated with 2,915 sentences either transcribing the audio spoken during the episodes or the subtitling segments of the show where the host conducts interviews in English. For this first release of the dataset, a total of 11,796 annotation sets for the sentences and 6,841 for the video are included. Each of the former includes a target lexical unit evoking a frame or one or more frame elements. For each video annotation, a bounding box in the image is correlated with a frame, a frame element and lexical unit evoking a frame in FrameNet.
pdf
bib
abs
Framed Multi30K: A Frame-Based Multimodal-Multilingual Dataset
Marcelo Viridiano
|
Arthur Lorenzi
|
Tiago Timponi Torrent
|
Ely E. Matos
|
Adriana S. Pagano
|
Natália Sathler Sigiliano
|
Maucha Gamonal
|
Helen de Andrade Abreu
|
Lívia Vicente Dutra
|
Mairon Samagaio
|
Mariane Carvalho
|
Franciany Campos
|
Gabrielly Azalim
|
Bruna Mazzei
|
Mateus Fonseca de Oliveira
|
Ana Carolina Luz
|
Livia Padua Ruiz
|
Júlia Bellei
|
Amanda Pestana
|
Josiane Costa
|
Iasmin Rabelo
|
Anna Beatriz Silva
|
Raquel Roza
|
Mariana Souza Mota
|
Igor Oliveira
|
Márcio Henrique Pelegrino de Freitas
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper presents Framed Multi30K (FM30K), a novel frame-based Brazilian Portuguese multimodal-multilingual dataset which i) extends the Multi30K dataset (Elliot et al., 2016) with 158,915 original Brazilian Portuguese descriptions, and 30,104 Brazilian Portuguese translations from original English descriptions; and ii) adds 2,677,613 frame evocation labels to the 158,915 English descriptions and to the ones created for Brazilian Portuguese; (iii) extends the Flickr30k Entities dataset (Plummer et al., 2015) with 190,608 frames and Frame Elements correlations with the existing phrase-to-region correlations.
2023
pdf
bib
abs
Modeling Construction Grammar’s Way into NLP: Insights from negative results in automatically identifying schematic clausal constructions in Brazilian Portuguese
Arthur Lorenzi
|
Vânia Gomes de Almeida
|
Ely Edison Matos
|
Tiago Timponi Torrent
Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023)
This paper reports on negative results in a task of automatic identification of schematic clausal constructions and their elements in Brazilian Portuguese. The experiment was set up so as to test whether form and meaning properties of constructions, modeled in terms of Universal Dependencies and FrameNet Frames in a Constructicon, would improve the performance of transformer models in the task. Qualitative analysis of the results indicate that alternatives to the linearization of those properties, dataset size and a post-processing module should be explored in the future as a means to make use of information in Constructicons for NLP tasks.
2022
pdf
bib
abs
Charon: A FrameNet Annotation Tool for Multimodal Corpora
Frederico Belcavello
|
Marcelo Viridiano
|
Ely Matos
|
Tiago Timponi Torrent
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
This paper presents Charon, a web tool for annotating multimodal corpora with FrameNet categories. Annotation can be made for corpora containing both static images and video sequences paired – or not – with text sequences. The pipeline features, besides the annotation interface, corpus import and pre-processing tools.
pdf
bib
abs
Domain Adaptation in Neural Machine Translation using a Qualia-Enriched FrameNet
Alexandre Diniz da Costa
|
Mateus Coutinho Marim
|
Ely Matos
|
Tiago Timponi Torrent
Proceedings of the Thirteenth Language Resources and Evaluation Conference
In this paper we present Scylla, a methodology for domain adaptation of Neural Machine Translation (NMT) systems that make use of a multilingual FrameNet enriched with qualia relations as an external knowledge base. Domain adaptation techniques used in NMT usually require fine-tuning and in-domain training data, which may pose difficulties for those working with lesser-resourced languages and may also lead to performance decay of the NMT system for out-of-domain sentences. Scylla does not require fine-tuning of the NMT model, avoiding the risk of model over-fitting and consequent decrease in performance for out-of-domain translations. Two versions of Scylla are presented: one using the source sentence as input, and another one using the target sentence. We evaluate Scylla in comparison to a state-of-the-art commercial NMT system in an experiment in which 50 sentences from the Sports domain are translated from Brazilian Portuguese to English. The two versions of Scylla significantly outperform the baseline commercial system in HTER.
pdf
bib
abs
Frame Shift Prediction
Zheng Xin Yong
|
Patrick D. Watson
|
Tiago Timponi Torrent
|
Oliver Czulo
|
Collin Baker
Proceedings of the Thirteenth Language Resources and Evaluation Conference
Frame shift is a cross-linguistic phenomenon in translation which results in corresponding pairs of linguistic material evoking different frames. The ability to predict frame shifts would enable (semi-)automatic creation of multilingual frame annotations and thus speeding up FrameNet creation through annotation projection. Here, we first characterize how frame shifts result from other linguistic divergences such as translational divergences and construal differences. Our analysis also shows that many pairs of frames in frame shifts are multi-hop away from each other in Berkeley FrameNet’s net-like configuration. Then, we propose the Frame Shift Prediction task and demonstrate that our graph attention networks, combined with auxiliary training, can learn cross-linguistic frame-to-frame correspondence and predict frame shifts.
pdf
bib
abs
Lutma: A Frame-Making Tool for Collaborative FrameNet Development
Tiago Timponi Torrent
|
Arthur Lorenzi
|
Ely Edison Matos
|
Frederico Belcavello
|
Marcelo Viridiano
|
Maucha Andrade Gamonal
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
This paper presents Lutma, a collaborative, semi-constrained, tutorial-based tool for contributing frames and lexical units to the Global FrameNet initiative. The tool parameterizes the process of frame creation, avoiding consistency violations and promoting the integration of frames contributed by the community with existing frames. Lutma is structured in a wizard-like fashion so as to provide users with text and video tutorials relevant for each step in the frame creation process. We argue that this tool will allow for a sensible expansion of FrameNet coverage in terms of both languages and cultural perspectives encoded by them, positioning frames as a viable alternative for representing perspective in language models.
pdf
bib
abs
The Case for Perspective in Multimodal Datasets
Marcelo Viridiano
|
Tiago Timponi Torrent
|
Oliver Czulo
|
Arthur Lorenzi
|
Ely Matos
|
Frederico Belcavello
Proceedings of the 1st Workshop on Perspectivist Approaches to NLP @LREC2022
This paper argues in favor of the adoption of annotation practices for multimodal datasets that recognize and represent the inherently perspectivized nature of multimodal communication. To support our claim, we present a set of annotation experiments in which FrameNet annotation is applied to the Multi30k and the Flickr 30k Entities datasets. We assess the cosine similarity between the semantic representations derived from the annotation of both pictures and captions for frames. Our findings indicate that: (i) frame semantic similarity between captions of the same picture produced in different languages is sensitive to whether the caption is a translation of another caption or not, and (ii) picture annotation for semantic frames is sensitive to whether the image is annotated in presence of a caption or not.