Tomek Strzalkowski

Also published as: T. Strzalkowski, Tomek Strzalkowskl


2024

pdf
Social Convos: Capturing Agendas and Emotions on Social Media
Ankita Bhaumik | Ning Sa | Gregorios Katsios | Tomek Strzalkowski
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Social media platforms are popular tools for disseminating targeted information during major public events like elections or pandemics. Systematic analysis of the message traffic can provide valuable insights into prevailing opinions and social dynamics among different segments of the population. We are specifically interested in influence spread, and in particular whether more deliberate influence operations can be detected. However, filtering out the essential messages with telltale influence indicators from the extensive and often chaotic social media traffic is a major challenge.In this paper we present a novel approach to extract influence indicators from messages circulating among groups of users discussing particular topics. We build upon the the concept of a convo to identify influential authors who are actively promoting some particular agenda around that topic within the group. We focus on two influence indicators: the (control of) agenda and the use of emotional language.

pdf
Uncovering Agendas: A Novel French & English Dataset for Agenda Detection on Social Media
Gregorios Katsios | Ning Sa | Ankita Bhaumik | Tomek Strzalkowski
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

The behavior and decision making of groups or communities can be dramatically influenced by individuals pushing particular agendas, e.g., to promote or disparage a person or an activity, to call for action, etc.. In the examination of online influence campaigns, particularly those related to important political and social events, scholars often concentrate on identifying the sources responsible for setting and controlling the agenda (e.g., public media). In this article we present a methodology for detecting specific instances of agenda control through social media where annotated data is limited or non-existent. By using a modest corpus of Twitter messages centered on the 2022 French Presidential Elections, we carry out a comprehensive evaluation of various approaches and techniques that can be applied to this problem. Our findings demonstrate that by treating the task as a textual entailment problem, it is possible to overcome the requirement for a large annotated training dataset.

2023

pdf
Adapting Emotion Detection to Analyze Influence Campaigns on Social Media
Ankita Bhaumik | Andy Bernhardt | Gregorios Katsios | Ning Sa | Tomek Strzalkowski
Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis

Social media is an extremely potent tool for influencing public opinion, particularly during important events such as elections, pandemics, and national conflicts. Emotions are a crucial aspect of this influence, but detecting them accurately in the political domain is a significant challenge due to the lack of suitable emotion labels and training datasets. In this paper, we present a generalized approach to emotion detection that can be adapted to the political domain with minimal performance sacrifice. Our approach is designed to be easily integrated into existing models without the need for additional training or fine-tuning. We demonstrate the zero-shot and few-shot performance of our model on the 2017 French presidential elections and propose efficient emotion groupings that would aid in effectively analyzing influence campaigns and agendas on social media.

2022

pdf
BeSt: The Belief and Sentiment Corpus
Jennifer Tracey | Owen Rambow | Claire Cardie | Adam Dalton | Hoa Trang Dang | Mona Diab | Bonnie Dorr | Louise Guthrie | Magdalena Markowska | Smaranda Muresan | Vinodkumar Prabhakaran | Samira Shaikh | Tomek Strzalkowski
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present the BeSt corpus, which records cognitive state: who believes what (i.e., factuality), and who has what sentiment towards what. This corpus is inspired by similar source-and-target corpora, specifically MPQA and FactBank. The corpus comprises two genres, newswire and discussion forums, in three languages, Chinese (Mandarin), English, and Spanish. The corpus is distributed through the LDC.

pdf
Towards a Progression-Aware Autonomous Dialogue Agent
Abraham Sanders | Tomek Strzalkowski | Mei Si | Albert Chang | Deepanshu Dey | Jonas Braasch | Dakuo Wang
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recent advances in large-scale language modeling and generation have enabled the creation of dialogue agents that exhibit human-like responses in a wide range of conversational scenarios spanning a diverse set of tasks, from general chit-chat to focused goal-oriented discourse. While these agents excel at generating high-quality responses that are relevant to prior context, they suffer from a lack of awareness of the overall direction in which the conversation is headed, and the likelihood of task success inherent therein. Thus, we propose a framework in which dialogue agents can evaluate the progression of a conversation toward or away from desired outcomes, and use this signal to inform planning for subsequent responses. Our framework is composed of three key elements: (1) the notion of a “global” dialogue state (GDS) space, (2) a task-specific progression function (PF) computed in terms of a conversation’s trajectory through this space, and (3) a planning mechanism based on dialogue rollouts by which an agent may use progression signals to select its next response.

2020

pdf bib
Active Defense Against Social Engineering: The Case for Human Language Technology
Adam Dalton | Ehsan Aghaei | Ehab Al-Shaer | Archna Bhatia | Esteban Castillo | Zhuo Cheng | Sreekar Dhaduvai | Qi Duan | Bryanna Hebenstreit | Md Mazharul Islam | Younes Karimi | Amir Masoumzadeh | Brodie Mather | Sashank Santhanam | Samira Shaikh | Alan Zemel | Tomek Strzalkowski | Bonnie J. Dorr
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management

We describe a system that supports natural language processing (NLP) components for active defenses against social engineering attacks. We deploy a pipeline of human language technology, including Ask and Framing Detection, Named Entity Recognition, Dialogue Engineering, and Stylometry. The system processes modern message formats through a plug-in architecture to accommodate innovative approaches for message analysis, knowledge representation and dialogue generation. The novelty of the system is that it uses NLP for cyber defense and engages the attacker using bots to elicit evidence to attribute to the attacker and to waste the attacker’s time and resources.

pdf bib
Adaptation of a Lexical Organization for Social Engineering Detection and Response Generation
Archna Bhatia | Adam Dalton | Brodie Mather | Sashank Santhanam | Samira Shaikh | Alan Zemel | Tomek Strzalkowski | Bonnie J. Dorr
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management

We present a paradigm for extensible lexicon development based on Lexical Conceptual Structure to support social engineering detection and response generation. We leverage the central notions of ask (elicitation of behaviors such as providing access to money) and framing (risk/reward implied by the ask). We demonstrate improvements in ask/framing detection through refinements to our lexical organization and show that response generation qualitatively improves as ask/framing detection performance improves. The paradigm presents a systematic and efficient approach to resource adaptation for improved task-specific performance.

pdf
Email Threat Detection Using Distinct Neural Network Approaches
Esteban Castillo | Sreekar Dhaduvai | Peng Liu | Kartik-Singh Thakur | Adam Dalton | Tomek Strzalkowski
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management

This paper describes different approaches to detect malicious content in email interactions through a combination of machine learning and natural language processing tools. Specifically, several neural network designs are tested on word embedding representations to detect suspicious messages and separate them from non-suspicious, benign email. The proposed approaches are trained and tested on distinct email collections, including datasets constructed from publicly available corpora (such as Enron, APWG, etc.) as well as several smaller, non-public datasets used in recent government evaluations. Experimental results show that back-propagation both with and without recurrent neural layers outperforms current state of the art techniques that include supervised learning algorithms with stylometric elements of texts as features. Our results also demonstrate that word embedding vectors are effective means for capturing certain aspects of text meaning that can be teased out through machine learning in non-linear/complex neural networks, in order to obtain highly accurate detection of malicious emails based on email text alone.

pdf
Learning to Plan and Realize Separately for Open-Ended Dialogue Systems
Sashank Santhanam | Zhuo Cheng | Brodie Mather | Bonnie Dorr | Archna Bhatia | Bryanna Hebenstreit | Alan Zemel | Adam Dalton | Tomek Strzalkowski | Samira Shaikh
Findings of the Association for Computational Linguistics: EMNLP 2020

Achieving true human-like ability to conduct a conversation remains an elusive goal for open-ended dialogue systems. We posit this is because extant approaches towards natural language generation (NLG) are typically construed as end-to-end architectures that do not adequately model human generation processes. To investigate, we decouple generation into two separate phases: planning and realization. In the planning phase, we train two planners to generate plans for response utterances. The realization phase uses response plans to produce an appropriate response. Through rigorous evaluations, both automated and human, we demonstrate that decoupling the process into planning and realization performs better than an end-to-end approach.

pdf
Generating Ethnographic Models from Communities’ Online Data
Tomek Strzalkowski | Anna Newheiser | Nathan Kemper | Ning Sa | Bharvee Acharya | Gregorios Katsios
Proceedings of the Second Workshop on Figurative Language Processing

In this paper we describe computational ethnography study to demonstrate how machine learning techniques can be utilized to exploit bias resident in language data produced by communities with online presence. Specifically, we leverage the use of figurative language (i.e., the choice of metaphors) in online text (e.g., news media, blogs) produced by distinct communities to obtain models of community worldviews that can be shown to be distinctly biased and thus different from other communities’ models. We automatically construct metaphor-based community models for two distinct scenarios: gun rights and marriage equality. We then conduct a series of experiments to validate the hypothesis that the metaphors found in each community’s online language convey the bias in the community’s worldview.

2018

pdf
Gaining and Losing Influence in Online Conversation
Arun Sharma | Tomek Strzalkowski
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf
ANEW+: Automatic Expansion and Validation of Affective Norms of Words Lexicons in Multiple Languages
Samira Shaikh | Kit Cho | Tomek Strzalkowski | Laurie Feldman | John Lien | Ting Liu | George Aaron Broadwell
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this article we describe our method of automatically expanding an existing lexicon of words with affective valence scores. The automatic expansion process was done in English. In addition, we describe our procedure for automatically creating lexicons in languages where such resources may not previously exist. The foreign languages we discuss in this paper are Spanish, Russian and Farsi. We also describe the procedures to systematically validate our newly created resources. The main contributions of this work are: 1) A general method for expansion and creation of lexicons with scores of words on psychological constructs such as valence, arousal or dominance; and 2) a procedure for ensuring validity of the newly constructed resources.

pdf
The Validation of MRCPD Cross-language Expansions on Imageability Ratings
Ting Liu | Kit Cho | Tomek Strzalkowski | Samira Shaikh | Mehrdad Mirzaei
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this article, we present a method to validate a multi-lingual (English, Spanish, Russian, and Farsi) corpus on imageability ratings automatically expanded from MRCPD (Liu et al., 2014). We employed the corpus (Brysbaert et al., 2014) on concreteness ratings for our English MRCPD+ validation because of lacking human assessed imageability ratings and high correlation between concreteness ratings and imageability ratings (e.g. r = .83). For the same reason, we built a small corpus with human imageability assessment for the other language corpus validation. The results show that the automatically expanded imageability ratings are highly correlated with human assessment in all four languages, which demonstrate our automatic expansion method is valid and robust. We believe these new resources can be of significant interest to the research community, particularly in natural language processing and computational sociolinguistics.

2015

pdf
A New Dataset and Evaluation for Belief/Factuality
Vinodkumar Prabhakaran | Tomas By | Julia Hirschberg | Owen Rambow | Samira Shaikh | Tomek Strzalkowski | Jennifer Tracey | Michael Arrigo | Rupayan Basu | Micah Clark | Adam Dalton | Mona Diab | Louise Guthrie | Anna Prokofieva | Stephanie Strassel | Gregory Werner | Yorick Wilks | Janyce Wiebe
Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics

pdf
Understanding Cultural Conflicts using Metaphors and Sociolinguistic Measures of Influence
Samira Shaikh | Tomek Strzalkowski | Sarah Taylor | John Lien | Ting Liu | George Aaron Broadwell | Laurie Feldman | Boris Yamrom | Kit Cho | Yuliya Peshkova
Proceedings of the Third Workshop on Metaphor in NLP

2014

pdf
Computing Affect in Metaphors
Tomek Strzalkowski | Samira Shaikh | Kit Cho | George Aaron Broadwell | Laurie Feldman | Sarah Taylor | Boris Yamrom | Ting Liu | Ignacio Cases | Yuliya Peshkova | Kyle Elliot
Proceedings of the Second Workshop on Metaphor in NLP

pdf
Discovering Conceptual Metaphors using Source Domain Spaces
Samira Shaikh | Tomek Strzalkowski | Kit Cho | Ting Liu | George Aaron Broadwell | Laurie Feldman | Sarah Taylor | Boris Yamrom | Ching-Sheng Lin | Ning Sa | Ignacio Cases | Yuliya Peshkova | Kyle Elliot
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)

pdf
Automatic Expansion of the MRC Psycholinguistic Database Imageability Ratings
Ting Liu | Kit Cho | G. Aaron Broadwell | Samira Shaikh | Tomek Strzalkowski | John Lien | Sarah Taylor | Laurie Feldman | Boris Yamrom | Nick Webb | Umit Boz | Ignacio Cases | Ching-sheng Lin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Recent studies in metaphor extraction across several languages (Broadwell et al., 2013; Strzalkowski et al., 2013) have shown that word imageability ratings are highly correlated with the presence of metaphors in text. Information about imageability of words can be obtained from the MRC Psycholinguistic Database (MRCPD) for English words and Léxico Informatizado del Español Programa (LEXESP) for Spanish words, which is a collection of human ratings obtained in a series of controlled surveys. Unfortunately, word imageability ratings were collected for only a limited number of words: 9,240 words in English, 6,233 in Spanish; and are unavailable at all in the other two languages studied: Russian and Farsi. The present study describes an automated method for expanding the MRCPD by conferring imageability ratings over the synonyms and hyponyms of existing MRCPD words, as identified in Wordnet. The result is an expanded MRCPD+ database with imagea-bility scores for more than 100,000 words. The appropriateness of this expansion process is assessed by examining the structural coherence of the expanded set and by validating the expanded lexicon against human judgment. Finally, the performance of the metaphor extraction system is shown to improve significantly with the expanded database. This paper describes the process for English MRCPD+ and the resulting lexical resource. The process is analogous for other languages.

pdf
A Multi-Cultural Repository of Automatically Discovered Linguistic and Conceptual Metaphors
Samira Shaikh | Tomek Strzalkowski | Ting Liu | George Aaron Broadwell | Boris Yamrom | Sarah Taylor | Laurie Feldman | Kit Cho | Umit Boz | Ignacio Cases | Yuliya Peshkova | Ching-Sheng Lin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

In this article, we present details about our ongoing work towards building a repository of Linguistic and Conceptual Metaphors. This resource is being developed as part of our research effort into the large-scale detection of metaphors from unrestricted text. We have stored a large amount of automatically extracted metaphors in American English, Mexican Spanish, Russian and Iranian Farsi in a relational database, along with pertinent metadata associated with these metaphors. A substantial subset of the contents of our repository has been systematically validated via rigorous social science experiments. Using information stored in the repository, we are able to posit certain claims in a cross-cultural context about how peoples in these cultures (America, Mexico, Russia and Iran) view particular concepts related to Governance and Economic Inequality through the use of metaphor. Researchers in the field can use this resource as a reference of typical metaphors used across these cultures. In addition, it can be used to recognize metaphors of the same form or pattern, in other domains of research.

2013

pdf
Robust Extraction of Metaphor from Novel Data
Tomek Strzalkowski | George Aaron Broadwell | Sarah Taylor | Laurie Feldman | Samira Shaikh | Ting Liu | Boris Yamrom | Kit Cho | Umit Boz | Ignacio Cases | Kyle Elliot
Proceedings of the First Workshop on Metaphor in NLP

pdf
Topical Positioning: A New Method for Predicting Opinion Changes in Conversation
Ching-Sheng Lin | Samira Shaikh | Jennifer Stromer-Galley | Jennifer Crowley | Tomek Strzalkowski | Veena Ravishankar
Proceedings of the Workshop on Language Analysis in Social Media

2012

pdf
Modeling Leadership and Influence in Multi-party Online Discourse
Tomek Strzalkowski | Samira Shaikh | Ting Liu | George Aaron Broadwell | Jenny Stromer-Galley | Sarah Taylor | Umit Boz | Veena Ravishankar | Xiaoai Ren
Proceedings of COLING 2012

pdf
Revealing Contentious Concepts Across Social Groups
Ching-Sheng Lin | Zumrut Akcam | Samira Shaikh | Sharon Small | Ken Stahl | Tomek Strzalkowski | Nick Webb
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, a computational model based on concept polarity is proposed to investigate the influence of communications across the diacultural groups. The hypothesis of this work is that there are communities or groups which can be characterized by a network of concepts and the corresponding valuations of those concepts that are agreed upon by the members of the community. We apply an existing research tool, ECO, to generate text representative of each community and create community specific Valuation Concept Networks (VCN). We then compare VCNs across the communities, to attempt to find contentious concepts, which could subsequently be the focus of further exploration as points of contention between the two communities. A prototype, CPAM (Changing Positions, Altering Minds), was implemented as a proof of concept for this approach. The experiment was conducted using blog data from pro-Palestinian and pro-Israeli communities. A potential application of this method and future work are discussed as well.

pdf
Extending the MPC corpus to Chinese and Urdu - A Multiparty Multi-Lingual Chat Corpus for Modeling Social Phenomena in Language
Ting Liu | Samira Shaikh | Tomek Strzalkowski | Aaron Broadwell | Jennifer Stromer-Galley | Sarah Taylor | Umit Boz | Xiaoai Ren | Jingsi Wu
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

In this paper, we report our efforts in building a multi-lingual multi-party online chat corpus in order to develop a firm understanding in a set of social constructs such as agenda control, influence, and leadership as well as to computationally model such constructs in online interactions. These automated models will help capture the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper, we first introduce our experiment design and data collection method in Chinese and Urdu, and then report on the current stage of our data collection. We annotated the collected corpus on four levels: communication links, dialogue acts, local topics, and meso-topics. Results from the analyses of annotated data on different languages indicate some interesting phenomena, which are reported in this paper.

pdf
Bootstrapping Events and Relations from Text
Ting Liu | Tomek Strzalkowski
Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf
Multi-Modal Annotation of Quest Games in Second Life
Sharon Gower Small | Jennifer Strommer-Galley | Tomek Strzalkowski
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf
MPC: A Multi-Party Chat Corpus for Modeling Social Phenomena in Discourse
Samira Shaikh | Tomek Strzalkowski | Aaron Broadwell | Jennifer Stromer-Galley | Sarah Taylor | Nick Webb
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

In this paper, we describe our experience with collecting and creating an annotated corpus of multi-party online conversations in a chat-room environment. This effort is part of a larger project to develop computational models of social phenomena such as agenda control, influence, and leadership in on-line interactions. Such models will help capturing the dialogue dynamics that are essential for developing, among others, realistic human-machine dialogue systems, including autonomous virtual chat agents. In this paper we describe data collection method used and the characteristics of the initial dataset of English chat. We have devised a multi-tiered collection process in which the subjects start from simple, free-flowing conversations and progress towards more complex and structured interactions. In this paper, we report on the first two stages of this process, which were recently completed. The third, large-scale collection effort is currently being conducted. All English dialogue has been annotated at four levels: communication links, dialogue acts, local topics and meso-topics. Some details of these annotations will be discussed later in this paper, although a full description is impossible within the scope of this article.

pdf bib
Proceedings of the ACL 2010 Student Research Workshop
Seniz Demir | Jan Raab | Nils Reiter | Marketa Lopatkova | Tomek Strzalkowski
Proceedings of the ACL 2010 Student Research Workshop

pdf
VCA: An Experiment with a Multiparty Virtual Chat Agent
Samira Shaikh | Tomek Strzalkowski | Sarah Taylor | Nick Webb
Proceedings of the 2010 Workshop on Companionable Dialogue Systems

pdf
Modeling Socio-Cultural Phenomena in Discourse
Tomek Strzalkowski | George Aaron Broadwell | Jennifer Stromer-Galley | Samira Shaikh | Sarah Taylor | Nick Webb
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)

2006

pdf
Utilizing Co-Occurrence of Answers in Question Answering
Min Wu | Tomek Strzalkowski
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2004

pdf
Data-Driven Strategies for an Automated Dialogue System
Hilda Hardy | Tomek Strzalkowski | Min Wu | Cristian Ursu | Nick Webb | Alan Biermann | R. Bryce Inouye | Ashley McKenzie
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

pdf
Designing a Realistic Evaluation of an End-to-end Interactive Question Answering System
Nina Wacholder | Sharon Small | Bing Bai | Diane Kelly | Robert Rittman | Sean Ryan | Robert Salkin | Peng Song | Ying Sun | Ting Liu | Paul Kantor | Tomek Strzalkowski
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

pdf
HITIQA: Towards Analytical Question Answering
Sharon Small | Tomek Strzalkowski | Ting Liu | Sean Ryan | Robert Salkin | Nobuyuki Shimizu | Paul Kantor | Diane Kelly | Robert Rittman | Nina Wacholder
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
HITIQA: A Data Driven Approach to Interactive Analytical Question Answering
Sharon Small | Tomek Strzalkowski
Proceedings of HLT-NAACL 2004: Short Papers

pdf
HITIQA: Scenario Based Question Answering
Sharon Small | Tomek Strzalkowski | Ting Liu | Sean Ryan | Robert Salkin | Nobuyuki Shimizu | Paul Kantor | Diane Kelly | Robert Rittman | Nina Wacholder | Boris Yamrom
Proceedings of the Workshop on Pragmatics of Question Answering at HLT-NAACL 2004

2003

pdf
Automatically Predicting Information Quality in News Documents
Rong Tang | Kwong Bor Ng | Tomek Strzalkowski | Paul B. Kantor
Companion Volume of the Proceedings of HLT-NAACL 2003 - Short Papers

pdf
Dialogue Management for an Automated Multilingual Call Center
Hilda Hardy | Tomek Strzalkowski | Min Wu
Proceedings of the HLT-NAACL 2003 Workshop on Research Directions in Dialogue Processing

pdf
HITIQA: An Interactive Question Answering System: A Preliminary Report
Sharon Small | Ting Liu | Nobuyuki Shimizu | Tomek Strzalkowski
Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering

2000

pdf
PartsID: A Dialogue-Based System for Identifying Parts for Medical Systems
Amit Bagga | Tomek Strzalkowski | G. Bowden Wise
Sixth Applied Natural Language Processing Conference

pdf
Evaluating Summaries for Multiple Documents in an Interactive Environment
Gees C. Stein | Tomek Strzalkowski | G. Bowden Wise | Amit Bagga
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1998

pdf
Enhancing Detection through Linguistic Indexing and Topic Expansion
Tomek Strzalkowski | Gees C. Stein | G. Bowden Wise
TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998

pdf
A Text-Extraction Based Summarizer
Tomek Strzalkowski | Gees C. Stein | G. Bowden Wise
TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998

pdf
Summarization-based Query Expansion in Information Retrieval
Tomek Strzalkowski | Jin Wang | Bowden Wise
COLING 1998 Volume 2: The 17th International Conference on Computational Linguistics

pdf
Summarization-based Query Expansion in Information Retrieval
Tomek Strzalkowski | Jin Wang | Bowden Wise
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 2

1997

pdf
Building Effective Queries In Natural Language Information Retrieval
Tomek Strzalkowski | Fang Lin | Jose Perez-Carballo | Jin Wang
Fifth Conference on Applied Natural Language Processing

pdf
A Natural Language Correction Model for Continuous Speech Recognition
Tomek Strzalkowski | Ronald Brandow
Fifth Workshop on Very Large Corpora

1996

pdf
Natural Language Information Retrieval: TIPSTER-2 Final Report
Tomek Strzalkowski
TIPSTER TEXT PROGRAM PHASE II: Proceedings of a Workshop held at Vienna, Virginia, May 6-8, 1996

pdf
Integration of Document Detection and Information Extraction
Louise Guthrie | Tomek Strzalkowski | Jin Wang | Fang Lin
TIPSTER TEXT PROGRAM PHASE II: Proceedings of a Workshop held at Vienna, Virginia, May 6-8, 1996

pdf
A Self-Learning Universal Concept Spotter
Tomek Strzalkowski | Jin Wang
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics

1994

pdf
Building a Lexical Domain Map From Text Corpora
Tomek Strzalkowski
COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics

pdf
Document Representation in Natural Language Text Retrieval
Tomek Strzalkowski
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

pdf
Robust Text Processing and Information Retrieval
Tomek Strzalkowski
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

pdf
Robust Text Processing in Automated Information Retrieval
Tomek Strzalkowski
Fourth Conference on Applied Natural Language Processing

1993

pdf
Evaluation of TTP Parser: A Preliminary Report
Tomek Strzalkowski | Peter G. N. Scheyen
Proceedings of the Third International Workshop on Parsing Technologies

TTP (Tagged Text Parser) is a fast and robust natural language parser specifically designed to process vast quantities of unrestricted text. TTP can analyze written text at the speed of approximately 0.3 sec/sentence, or 73 words per second. An important novel feature of TTP parser is that it is equipped with a skip-and-fit recovery mechanism that allows for fast closing of more difficult sub-constituents after a preset amount of time has elapsed without producing a parse. Although a complete analysis is attempted for each sentence, the parser may occasionally ignore fragments of input to resume “normal” processing after skipping a few words. These fragments are later analyzed separately and attached as incomplete constituents to the main parse tree. TTP has recently been evaluated against several leading parsers. While no formal numbers were released (a formal evaluation is planned later this year), TTP has performed surprisingly well. The main argument of this paper is that TTP can provide a substantial gain in parsing speed giving up relatively little in terms of the quality of output it produces. This property allows TTP to be used effectively in parsing large volumes of text.

pdf bib
Robust Text Processing in Automated Information Retrieval
Tomek Strzalkowski
Very Large Corpora: Academic and Industrial Perspectives

pdf
Robust Text Processing and Information Retrieval
Tomek Strzalkowski
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993

1992

pdf
Comparing Two Grammar-Based Generation Algorithms: A Case Study
Miroslav Martinovic | Tomek Strzalkowski
30th Annual Meeting of the Association for Computational Linguistics

pdf
Information Retrieval Using Robust Natural Language Processing
Tomek Strzalkowski | Barbara Vauthey
30th Annual Meeting of the Association for Computational Linguistics

pdf
Information Retrieval Using Robust Natural Language Processing
Tomek Strzalkowski
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

pdf
TTP: A Fast and Robust Parser for Natural Language
Tomek Strzalkowski
COLING 1992 Volume 1: The 14th International Conference on Computational Linguistics

1991

pdf
A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars
E. Black | S. Abney | D. Flickenger | C. Gdaniec | R. Grishman | P. Harrison | D. Hindle | R. Ingria | F. Jelinek | J. Klavans | M. Liberman | M. Marcus | S. Roukos | B. Santorini | T. Strzalkowski
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf
Fast Text Processing for Information Retrieval
Tomek Strzalkowski | Barbara Vauthey
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf
A General Computational Method for Grammar Inversion
Tomek Strzalkowski
Reversible Grammar in Natural Language Processing

1990

pdf
How to Invert a Natural Language Parser Into an Efficient Generator: An Algorithm for Logic Grammars
Tomek Strzalkowskl
COLING 1990 Volume 2: Papers presented to the 13th International Conference on Computational Linguistics

pdf
Automated Inversion of Logic Grammars for Generation
Tomek Strzalkowski | Ping Peng
28th Annual Meeting of the Association for Computational Linguistics

1989

pdf
Non-singular Concepts in Natural Language Discourse
Tomek Strzalkowski | Nick Cercone
Computational Linguistics, Volume 15, Number 3, September 1989

1986

pdf
An Approach to Non-Singular Terms in Discourse
Tomek Strzalkowski
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics

1983

pdf
Natural Language Information Retrieval System Dialog
L. Bole | K. Kochut | A. Lesniewski | T. Strzalkowski
First Conference of the European Chapter of the Association for Computational Linguistics

Search
Co-authors