Bruno Martins

2022

Interest in argument mining has resulted in an increasing number of argument annotated corpora. However, most focus on English texts with explicit argumentative discourse markers, such as persuasive essays or legal documents. Conversely, we report on the first extensive and consolidated Portuguese argument annotation project focused on opinion articles. We briefly describe the annotation guidelines based on a multi-layered process and analyze the manual annotations produced, highlighting the main challenges of this textual genre. We then conduct a comprehensive inter-annotator agreement analysis, including argumentative discourse units, their classes and relations, and resulting graphs. This analysis reveals that each of these aspects tackles very different kinds of challenges. We observe differences in annotator profiles, motivating our aim of producing a non-aggregated corpus containing the insights of every annotator. We note that the interpretation and identification of token-level arguments is challenging; nevertheless, tasks that focus on higher-level components of the argument structure can obtain considerable agreement. We lay down perspectives on corpus usage, exploiting its multi-faceted nature.

pdf abs
Dense Template Retrieval for Customer Support
Tiago Mesquita | Bruno Martins | Mariana Almeida
Proceedings of the 29th International Conference on Computational Linguistics

Templated answers are used extensively in customer support scenarios, providing an efficient way to cover a plethora of topics, with an easily maintainable collection of templates. However, the number of templates is often too high for an agent to manually search. Automatically suggesting the correct template for a given question can thus improve the service efficiency, reducing costs and leading to a better customer satisfaction. In this work, we propose a dense retrieval framework for the customer support scenario, adapting a standard in-batch negatives technique to support unpaired sampling of queries and templates. We also propose a novel loss that extends the typical query-centric similarity, exploiting other similarity relations in the training data. Experiments show that our approach achieves considerable improvements, in terms of performance and training speed, over more standard dense retrieval methods. This includes methods such as DPR, and also ablated versions of the proposed approach.

2020

2011

This work describes a process to extract Named Entity (NE) translations from the text available in web links (anchor texts). It translates a NE by retrieving a list of web documents in the target language, extracting the anchor texts from the links to those documents and finding the best translation from the anchor texts, using a combination of features, some of which, are specific to anchor texts. Experiments performed on a manually built corpora, suggest that over 70% of the NEs, ranging from unpopular to popular entities, can be translated correctly using sorely anchor texts. Tests on a Machine Translation task indicate that the system can be used to improve the quality of the translations of state-of-the-art statistical machine translation systems.

Co-authors

Venues

semeval6
lrec1
iwslt1
emnlp1
eamt1
show all...

coling1

Bruno Martins

2022

2020

2015

2014

2013

2011

Co-authors

Venues