Automatic Identification of AltLexes using Monolingual Parallel Corpora

Elnaz Davoodi, Leila Kosseim


Abstract
The automatic identification of discourse relations is still a challenging task in natural language processing. Discourse connectives, such as since or but, are the most informative cues to identify explicit relations; however discourse parsers typically use a closed inventory of such connectives. As a result, discourse relations signalled by markers outside these inventories (i.e. AltLexes) are not detected as effectively. In this paper, we propose a novel method to leverage parallel corpora in text simplification and lexical resources to automatically identify alternative lexicalizations that signal discourse relation. When applied to the Simple Wikipedia and Newsela corpora along with WordNet and the PPDB, the method allowed the automatic discovery of 91 AltLexes.
Anthology ID:
R17-1027
Volume:
Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017
Month:
September
Year:
2017
Address:
Varna, Bulgaria
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
195–200
Language:
URL:
https://doi.org/10.26615/978-954-452-049-6_027
DOI:
10.26615/978-954-452-049-6_027
Bibkey:
Cite (ACL):
Elnaz Davoodi and Leila Kosseim. 2017. Automatic Identification of AltLexes using Monolingual Parallel Corpora. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP 2017, pages 195–200, Varna, Bulgaria. INCOMA Ltd..
Cite (Informal):
Automatic Identification of AltLexes using Monolingual Parallel Corpora (Davoodi & Kosseim, RANLP 2017)
Copy Citation:
PDF:
https://doi.org/10.26615/978-954-452-049-6_027