Automatic Expansion of the MRC Psycholinguistic Database Imageability Ratings

Ting Liu, Kit Cho, G. Aaron Broadwell, Samira Shaikh, Tomek Strzalkowski, John Lien, Sarah Taylor, Laurie Feldman, Boris Yamrom, Nick Webb, Umit Boz, Ignacio Cases, Ching-sheng Lin


Abstract
Recent studies in metaphor extraction across several languages (Broadwell et al., 2013; Strzalkowski et al., 2013) have shown that word imageability ratings are highly correlated with the presence of metaphors in text. Information about imageability of words can be obtained from the MRC Psycholinguistic Database (MRCPD) for English words and Léxico Informatizado del Español Programa (LEXESP) for Spanish words, which is a collection of human ratings obtained in a series of controlled surveys. Unfortunately, word imageability ratings were collected for only a limited number of words: 9,240 words in English, 6,233 in Spanish; and are unavailable at all in the other two languages studied: Russian and Farsi. The present study describes an automated method for expanding the MRCPD by conferring imageability ratings over the synonyms and hyponyms of existing MRCPD words, as identified in Wordnet. The result is an expanded MRCPD+ database with imagea-bility scores for more than 100,000 words. The appropriateness of this expansion process is assessed by examining the structural coherence of the expanded set and by validating the expanded lexicon against human judgment. Finally, the performance of the metaphor extraction system is shown to improve significantly with the expanded database. This paper describes the process for English MRCPD+ and the resulting lexical resource. The process is analogous for other languages.
Anthology ID:
L14-1186
Volume:
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Month:
May
Year:
2014
Address:
Reykjavik, Iceland
Editors:
Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
2800–2805
Language:
URL:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/178_Paper.pdf
DOI:
Bibkey:
Cite (ACL):
Ting Liu, Kit Cho, G. Aaron Broadwell, Samira Shaikh, Tomek Strzalkowski, John Lien, Sarah Taylor, Laurie Feldman, Boris Yamrom, Nick Webb, Umit Boz, Ignacio Cases, and Ching-sheng Lin. 2014. Automatic Expansion of the MRC Psycholinguistic Database Imageability Ratings. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 2800–2805, Reykjavik, Iceland. European Language Resources Association (ELRA).
Cite (Informal):
Automatic Expansion of the MRC Psycholinguistic Database Imageability Ratings (Liu et al., LREC 2014)
Copy Citation:
PDF:
http://www.lrec-conf.org/proceedings/lrec2014/pdf/178_Paper.pdf