RELATIONS - Workshop on meaning relations between phrases and sentences

Venelin Kovatchev, Darina Gold, Torsten Zesch (Editors)


Anthology ID:
W19-08
Month:
May
Year:
2019
Address:
Gothenburg, Sweden
Venue:
IWCS
SIG:
SIGSEM
Publisher:
Association for Computational Linguistics
URL:
https://aclanthology.org/W19-08
DOI:
Bib Export formats:
BibTeX
PDF:
https://preview.aclanthology.org/ingestion-script-update/W19-08.pdf

pdf bib
RELATIONS - Workshop on meaning relations between phrases and sentences
Venelin Kovatchev | Darina Gold | Torsten Zesch

pdf bib
Assessing the Difficulty of Classifying ConceptNet Relations in a Multi-Label Classification Setting
Maria Becker | Michael Staniek | Vivi Nastase | Anette Frank

Commonsense knowledge relations are crucial for advanced NLU tasks. We examine the learnability of such relations as represented in ConceptNet, taking into account their specific properties, which can make relation classification difficult: a given concept pair can be linked by multiple relation types, and relations can have multi-word arguments of diverse semantic types. We explore a neural open world multi-label classification approach that focuses on the evaluation of classification accuracy for individual relations. Based on an in-depth study of the specific properties of the ConceptNet resource, we investigate the impact of different relation representations and model variations. Our analysis reveals that the complexity of argument types and relation ambiguity are the most important challenges to address. We design a customized evaluation method to address the incompleteness of the resource that can be expanded in future work.

pdf bib
Detecting Collocations Similarity via Logical-Linguistic Model
Nina Khairova | Svitlana Petrasova | Orken Mamyrbayev | Kuralay Mukhsina

Semantic similarity between collocations, along with words similarity, is one of the main issues of NLP, which must be addressed, in particular, in order to facilitate the automatic thesaurus generation. In the paper, we consider the logical-linguistic model that allows defining the relation of semantic similarity of collocations via the logical-algebraic equations. We provide the model for English, Ukrainian and Russian text corpora. The implementation for each language is slightly different in the equations of the finite predicates algebra and used linguistic resources. As a dataset for our experiment, we use 5801 pairs of sentences of Microsoft Research Paraphrase Corpus for English and more than 1 000 texts of scientific papers for Russian and Ukrainian.

pdf
Detecting Paraphrases of Standard Clause Titles in Insurance Contracts
Frieda Josi | Christian Wartena | Ulrich Heid

For the analysis of contract texts, validated model texts, such as model clauses, can be used to identify reused contract clauses. This paper investigates how to calculate the similarity between titles of model clauses and headings extracted from contracts, and which similarity measure is most suitable for this. For the calculation of the similarities between title pairs we tested various variants of string similarity and token based similarity. We also compare two more semantic similarity measures based on word embeddings using pretrained embeddings and word embeddings trained on contract texts. The identification of the model clause title can be used as a starting point for the mapping of clauses found in contracts to verified clauses.

pdf
Semantic Matching of Documents from Heterogeneous Collections: A Simple and Transparent Method for Practical Applications
Mark-Christoph Mueller

We present a very simple, unsupervised method for the pairwise matching of documents from heterogeneous collections. We demonstrate our method with the Concept-Project matching task, which is a binary classification task involving pairs of documents from heterogeneous collections. Although our method only employs standard resources without any domain- or task-specific modifications, it clearly outperforms the more complex system of the original authors. In addition, our method is transparent, because it provides explicit information about how a similarity score was computed, and efficient, because it is based on the aggregation of (pre-computable) word-level similarities.