Sandra Szasz


2012

pdf
The CONCISUS Corpus of Event Summaries
Horacio Saggion | Sandra Szasz
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Text summarization and information extraction systems require adaptation to new domains and languages. This adaptation usually depends on the availability of language resources such as corpora. In this paper we present a comparable corpus in Spanish and English for the study of cross-lingual information extraction and summarization: the CONCISUS Corpus. It is a rich human-annotated dataset composed of comparable event summaries in Spanish and English covering four different domains: aviation accidents, rail accidents, earthquakes, and terrorist attacks. In addition to the monolingual summaries in English and Spanish, we provide automatic translations and ``comparable'' full event reports of the events. The human annotations are concepts marked in the textual sources representing the key event information associated to the event type. The dataset has also been annotated using text processing pipelines. It is being made freely available to the research community for research purposes.

2011

pdf bib
Multi-domain Cross-lingual Information Extraction from Clean and Noisy Texts
Horacio Saggion | Sandra Szasz
Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology

2010

pdf
Human Language Technology for Text-based Analysis of Psychotherapy Sessions in the Spanish Language
Horacio Saggion | Elena Stein-Sparvieri | David Maldavsky | Sandra Szasz
Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas

pdf
NLP Resources for the Analysis of Patient/Therapist Interviews
Horacio Saggion | Elena Stein-Sparvieri | David Maldavsky | Sandra Szasz
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We present a set of tools and resources for the analysis of interviews during psychotherapy sessions. One of the main components of the work is a dictionary-based text interpretation tool for the Spanish language. The tool is designed to identify a subset of Freudian drives in patient and therapist discourse.