Tomáš Brychcín

2018

pdf bib abs
UWB at SemEval-2018 Task 10: Capturing Discriminative Attributes from Word Distributions
Tomáš Brychcín | Tomáš Hercig | Josef Steinberger | Michal Konkol
Proceedings of The 12th International Workshop on Semantic Evaluation

We present our UWB system for the task of capturing discriminative attributes at SemEval 2018. Given two words and an attribute, the system decides, whether this attribute is discriminative between the words or not. Assuming Distributional Hypothesis, i.e., a word meaning is related to the distribution across contexts, we introduce several approaches to compare word contextual information. We experiment with state-of-the-art semantic spaces and with simple co-occurrence statistics. We show the word distribution in the corpus has potential for detecting discriminative attributes. Our system achieves F1 score 72.1% and is ranked #4 among 26 submitted systems.

2017

pdf bib abs
Geographical Evaluation of Word Embeddings
Michal Konkol | Tomáš Brychcín | Michal Nykl | Tomáš Hercig
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Word embeddings are commonly compared either with human-annotated word similarities or through improvements in natural language processing tasks. We propose a novel principle which compares the information from word embeddings with reality. We implement this principle by comparing the information in the word embeddings with geographical positions of cities. Our evaluation linearly transforms the semantic space to optimally fit the real positions of cities and measures the deviation between the position given by word embeddings and the real position. A set of well-known word embeddings with state-of-the-art results were evaluated. We also introduce a visualization that helps with error analysis.

pdf bib abs
Cross-lingual Flames Detection in News Discussions
Josef Steinberger | Tomáš Brychcín | Tomáš Hercig | Peter Krejzl
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We introduce Flames Detector, an online system for measuring flames, i.e. strong negative feelings or emotions, insults or other verbal offences, in news commentaries across five languages. It is designed to assist journalists, public institutions or discussion moderators to detect news topics which evoke wrangles. We propose a machine learning approach to flames detection and calculate an aggregated score for a set of comment threads. The demo application shows the most flaming topics of the current period in several language variants. The search functionality gives a possibility to measure flames in any topic specified by a query. The evaluation shows that the flame detection in discussions is a difficult task, however, the application can already reveal interesting information about the actual news discussions.

pdf bib abs
Pyramid-based Summary Evaluation Using Abstract Meaning Representation
Josef Steinberger | Peter Krejzl | Tomáš Brychcín
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017

We propose a novel metric for evaluating summary content coverage. The evaluation framework follows the Pyramid approach to measure how many summarization content units, considered important by human annotators, are contained in an automatic summary. Our approach automatizes the evaluation process, which does not need any manual intervention on the evaluated summary side. Our approach compares abstract meaning representations of each content unit mention and each summary sentence. We found that the proposed metric complements well the widely-used ROUGE metrics.

pdf bib abs
Unsupervised Dialogue Act Induction using Gaussian Mixtures
Tomáš Brychcín | Pavel Král
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

This paper introduces a new unsupervised approach for dialogue act induction. Given the sequence of dialogue utterances, the task is to assign them the labels representing their function in the dialogue. Utterances are represented as real-valued vectors encoding their meaning. We model the dialogue as Hidden Markov model with emission probabilities estimated by Gaussian mixtures. We use Gibbs sampling for posterior inference. We present the results on the standard Switchboard-DAMSL corpus. Our algorithm achieves promising results compared with strong supervised baselines and outperforms other unsupervised algorithms.