2024
pdf
bib
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Archna Bhatia
|
Gosse Bouma
|
A. Seza Doğruöz
|
Kilian Evang
|
Marcos Garcia
|
Voula Giouli
|
Lifeng Han
|
Joakim Nivre
|
Alexandre Rademaker
Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
pdf
abs
UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies
Leonie Weissweiler
|
Nina Böbel
|
Kirian Guiller
|
Santiago Herrera
|
Wesley Scivetti
|
Arthur Lorenzi
|
Nurit Melnik
|
Archna Bhatia
|
Hinrich Schütze
|
Lori Levin
|
Amir Zeldes
|
Joakim Nivre
|
William Croft
|
Nathan Schneider
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements—for example, interrogative sentences with special markers and/or word orders—are not labeled holistically. We argue for (i) augmenting UD annotations with a ‘UCxn’ annotation layer for such meaning-bearing grammatical constructions, and (ii) approaching this in a typologically informed way so that morphosyntactic strategies can be compared across languages. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns. In addition to findings regarding these particular constructions, our study yields important insights on methodology for describing and identifying constructions in language-general and language-particular ways, and lays the foundation for future constructional enrichment of UD treebanks.
2023
pdf
bib
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
Archna Bhatia
|
Kilian Evang
|
Marcos Garcia
|
Voula Giouli
|
Lifeng Han
|
Shiva Taslimipoor
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
pdf
abs
PARSEME corpus release 1.3
Agata Savary
|
Cherifa Ben Khelil
|
Carlos Ramisch
|
Voula Giouli
|
Verginica Barbu Mititelu
|
Najet Hadj Mohamed
|
Cvetana Krstev
|
Chaya Liebeskind
|
Hongzhi Xu
|
Sara Stymne
|
Tunga Güngör
|
Thomas Pickard
|
Bruno Guillaume
|
Eduard Bejček
|
Archna Bhatia
|
Marie Candito
|
Polona Gantar
|
Uxoa Iñurrieta
|
Albert Gatt
|
Jolanta Kovalevskaite
|
Timm Lichte
|
Nikola Ljubešić
|
Johanna Monti
|
Carla Parra Escartín
|
Mehrnoush Shamsfard
|
Ivelina Stoyanova
|
Veronika Vincze
|
Abigail Walsh
Proceedings of the 19th Workshop on Multiword Expressions (MWE 2023)
We present version 1.3 of the PARSEME multilingual corpus annotated with verbal multiword expressions. Since the previous version, new languages have joined the undertaking of creating such a resource, some of the already existing corpora have been enriched with new annotated texts, while others have been enhanced in various ways. The PARSEME multilingual corpus represents 26 languages now. All monolingual corpora therein use Universal Dependencies v.2 tagset. They are (re-)split observing the PARSEME v.1.2 standard, which puts impact on unseen VMWEs. With the current iteration, the corpus release process has been detached from shared tasks; instead, a process for continuous improvement and systematic releases has been introduced.
2022
pdf
bib
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
Archna Bhatia
|
Paul Cook
|
Shiva Taslimipoor
|
Marcos Garcia
|
Carlos Ramisch
Proceedings of the 18th Workshop on Multiword Expressions @LREC2022
2021
pdf
abs
Towards the Development of Speech-Based Measures of Stress Response in Individuals
Archna Bhatia
|
Toshiya Miyatsu
|
Peter Pirolli
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access
Psychological and physiological stress in the environment can induce a different stress response in different individuals. Given the causal relationship between stress, mental health, and psychopathologies, as well as its impact on individuals’ executive functioning and performance, identifying the extent of stress response in individuals can be useful for providing targeted support to those who are in need. In this paper, we identify and validate features in speech that can be used as indicators of stress response in individuals to develop speech-based measures of stress response. We evaluate effectiveness of two types of tasks used for collecting speech samples in developing stress response measures, namely Read Speech Task and Open-Ended Question Task. Participants completed these tasks, along with the verbal fluency task (an established measure of executive functioning) before and after clinically validated stress induction to see if the changes in the speech-based features are associated with the stress-induced decline in executive functioning. Further, we supplement our analyses with an extensive, external assessment of the individuals’ stress tolerance in the real life to validate the usefulness of the speech-based measures in predicting meaningful outcomes outside of the experimental setting.
2020
pdf
bib
Proceedings of the Third International Workshop on Spatial Language Understanding
Parisa Kordjamshidi
|
Archna Bhatia
|
Malihe Alikhani
|
Jason Baldridge
|
Mohit Bansal
|
Marie-Francine Moens
Proceedings of the Third International Workshop on Spatial Language Understanding
pdf
abs
Edition 1.2 of the PARSEME Shared Task on Semi-supervised Identification of Verbal Multiword Expressions
Carlos Ramisch
|
Agata Savary
|
Bruno Guillaume
|
Jakub Waszczuk
|
Marie Candito
|
Ashwini Vaidya
|
Verginica Barbu Mititelu
|
Archna Bhatia
|
Uxoa Iñurrieta
|
Voula Giouli
|
Tunga Güngör
|
Menghan Jiang
|
Timm Lichte
|
Chaya Liebeskind
|
Johanna Monti
|
Renata Ramisch
|
Sara Stymne
|
Abigail Walsh
|
Hongzhi Xu
Proceedings of the Joint Workshop on Multiword Expressions and Electronic Lexicons
We present edition 1.2 of the PARSEME shared task on identification of verbal multiword expressions (VMWEs). Lessons learned from previous editions indicate that VMWEs have low ambiguity, and that the major challenge lies in identifying test instances never seen in the training data. Therefore, this edition focuses on unseen VMWEs. We have split annotated corpora so that the test corpora contain around 300 unseen VMWEs, and we provide non-annotated raw corpora to be used by complementary discovery methods. We released annotated and raw corpora in 14 languages, and this semi-supervised challenge attracted 7 teams who submitted 9 system results. This paper describes the effort of corpus creation, the task design, and the results obtained by the participating systems, especially their performance on unseen expressions.
pdf
abs
Learning to Plan and Realize Separately for Open-Ended Dialogue Systems
Sashank Santhanam
|
Zhuo Cheng
|
Brodie Mather
|
Bonnie Dorr
|
Archna Bhatia
|
Bryanna Hebenstreit
|
Alan Zemel
|
Adam Dalton
|
Tomek Strzalkowski
|
Samira Shaikh
Findings of the Association for Computational Linguistics: EMNLP 2020
Achieving true human-like ability to conduct a conversation remains an elusive goal for open-ended dialogue systems. We posit this is because extant approaches towards natural language generation (NLG) are typically construed as end-to-end architectures that do not adequately model human generation processes. To investigate, we decouple generation into two separate phases: planning and realization. In the planning phase, we train two planners to generate plans for response utterances. The realization phase uses response plans to produce an appropriate response. Through rigorous evaluations, both automated and human, we demonstrate that decoupling the process into planning and realization performs better than an end-to-end approach.
pdf
bib
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management
Archna Bhatia
|
Samira Shaikh
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management
pdf
bib
abs
Active Defense Against Social Engineering: The Case for Human Language Technology
Adam Dalton
|
Ehsan Aghaei
|
Ehab Al-Shaer
|
Archna Bhatia
|
Esteban Castillo
|
Zhuo Cheng
|
Sreekar Dhaduvai
|
Qi Duan
|
Bryanna Hebenstreit
|
Md Mazharul Islam
|
Younes Karimi
|
Amir Masoumzadeh
|
Brodie Mather
|
Sashank Santhanam
|
Samira Shaikh
|
Alan Zemel
|
Tomek Strzalkowski
|
Bonnie J. Dorr
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management
We describe a system that supports natural language processing (NLP) components for active defenses against social engineering attacks. We deploy a pipeline of human language technology, including Ask and Framing Detection, Named Entity Recognition, Dialogue Engineering, and Stylometry. The system processes modern message formats through a plug-in architecture to accommodate innovative approaches for message analysis, knowledge representation and dialogue generation. The novelty of the system is that it uses NLP for cyber defense and engages the attacker using bots to elicit evidence to attribute to the attacker and to waste the attacker’s time and resources.
pdf
bib
abs
Adaptation of a Lexical Organization for Social Engineering Detection and Response Generation
Archna Bhatia
|
Adam Dalton
|
Brodie Mather
|
Sashank Santhanam
|
Samira Shaikh
|
Alan Zemel
|
Tomek Strzalkowski
|
Bonnie J. Dorr
Proceedings for the First International Workshop on Social Threats in Online Conversations: Understanding and Management
We present a paradigm for extensible lexicon development based on Lexical Conceptual Structure to support social engineering detection and response generation. We leverage the central notions of ask (elicitation of behaviors such as providing access to money) and framing (risk/reward implied by the ask). We demonstrate improvements in ask/framing detection through refinements to our lexical organization and show that response generation qualitatively improves as ask/framing detection performance improves. The paradigm presents a systematic and efficient approach to resource adaptation for improved task-specific performance.
pdf
abs
From Spatial Relations to Spatial Configurations
Soham Dan
|
Parisa Kordjamshidi
|
Julia Bonn
|
Archna Bhatia
|
Zheng Cai
|
Martha Palmer
|
Dan Roth
Proceedings of the Twelfth Language Resources and Evaluation Conference
Spatial Reasoning from language is essential for natural language understanding. Supporting it requires a representation scheme that can capture spatial phenomena encountered in language as well as in images and videos. Existing spatial representations are not sufficient for describing spatial configurations used in complex tasks. This paper extends the capabilities of existing spatial representation languages and increases coverage of the semantic aspects that are needed to ground spatial meaning of natural language text in the world. Our spatial relation language is able to represent a large, comprehensive set of spatial concepts crucial for reasoning and is designed to support composition of static and dynamic spatial configurations. We integrate this language with the Abstract Meaning Representation (AMR) annotation schema and present a corpus annotated by this extended AMR. To exhibit the applicability of our representation scheme, we annotate text taken from diverse datasets and show how we extend the capabilities of existing spatial representation languages with fine-grained decomposition of semantics and blend it seamlessly with AMRs of sentences and discourse representations as a whole.
2019
pdf
bib
Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) and Grounded Communication for Robotics (RoboNLP)
Archna Bhatia
|
Yonatan Bisk
|
Parisa Kordjamshidi
|
Jesse Thomason
Proceedings of the Combined Workshop on Spatial Language Understanding (SpLU) and Grounded Communication for Robotics (RoboNLP)
2018
pdf
bib
Proceedings of the First International Workshop on Spatial Language Understanding
Parisa Kordjamshidi
|
Archna Bhatia
|
James Pustejovsky
|
Marie-Francine Moens
Proceedings of the First International Workshop on Spatial Language Understanding
pdf
abs
Edition 1.1 of the PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Carlos Ramisch
|
Silvio Ricardo Cordeiro
|
Agata Savary
|
Veronika Vincze
|
Verginica Barbu Mititelu
|
Archna Bhatia
|
Maja Buljan
|
Marie Candito
|
Polona Gantar
|
Voula Giouli
|
Tunga Güngör
|
Abdelati Hawwari
|
Uxoa Iñurrieta
|
Jolanta Kovalevskaitė
|
Simon Krek
|
Timm Lichte
|
Chaya Liebeskind
|
Johanna Monti
|
Carla Parra Escartín
|
Behrang QasemiZadeh
|
Renata Ramisch
|
Nathan Schneider
|
Ivelina Stoyanova
|
Ashwini Vaidya
|
Abigail Walsh
Proceedings of the Joint Workshop on Linguistic Annotation, Multiword Expressions and Constructions (LAW-MWE-CxG-2018)
This paper describes the PARSEME Shared Task 1.1 on automatic identification of verbal multiword expressions. We present the annotation methodology, focusing on changes from last year’s shared task. Novel aspects include enhanced annotation guidelines, additional annotated data for most languages, corpora for some new languages, and new evaluation settings. Corpora were created for 20 languages, which are also briefly discussed. We report organizational principles behind the shared task and the evaluation metrics employed for ranking. The 17 participating systems, their methods and obtained results are also presented and analysed.
2017
pdf
abs
Double Trouble: The Problem of Construal in Semantic Annotation of Adpositions
Jena D. Hwang
|
Archna Bhatia
|
Na-Rae Han
|
Tim O’Gorman
|
Vivek Srikumar
|
Nathan Schneider
Proceedings of the 6th Joint Conference on Lexical and Computational Semantics (*SEM 2017)
We consider the semantics of prepositions, revisiting a broad-coverage annotation scheme used for annotating all 4,250 preposition tokens in a 55,000 word corpus of English. Attempts to apply the scheme to adpositions and case markers in other languages, as well as some problematic cases in English, have led us to reconsider the assumption that an adposition’s lexical contribution is equivalent to the role/relation that it mediates. Our proposal is to embrace the potential for construal in adposition use, expressing such phenomena directly at the token level to manage complexity and avoid sense proliferation. We suggest a framework to represent both the scene role and the adposition’s lexical function so they can be annotated at scale—supporting automatic, statistical processing of domain-general language—and discuss how this representation would allow for a simpler inventory of labels.
pdf
abs
Compositionality in Verb-Particle Constructions
Archna Bhatia
|
Choh Man Teng
|
James Allen
Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017)
We are developing a broad-coverage deep semantic lexicon for a system that parses sentences into a logical form expressed in a rich ontology that supports reasoning. In this paper we look at verb-particle constructions (VPCs), and the extent to which they can be treated compositionally vs idiomatically. First we distinguish between the different types of VPCs based on their compositionality and then present a set of heuristics for classifying specific instances as compositional or not. We then identify a small set of general sense classes for particles when used compositionally and discuss the resulting lexical representations that are being added to the lexicon. By treating VPCs as compositional whenever possible, we attain broad coverage in a compact way, and also enable interpretations of novel VPC usages not explicitly present in the lexicon.
pdf
abs
Characterization of Divergence in Impaired Speech of ALS Patients
Archna Bhatia
|
Bonnie Dorr
|
Kristy Hollingshead
|
Samuel L. Phillips
|
Barbara McKenzie
BioNLP 2017
Approximately 80% to 95% of patients with Amyotrophic Lateral Sclerosis (ALS) eventually develop speech impairments, such as defective articulation, slow laborious speech and hypernasality. The relationship between impaired speech and asymptomatic speech may be seen as a divergence from a baseline. This relationship can be characterized in terms of measurable combinations of phonological characteristics that are indicative of the degree to which the two diverge. We demonstrate that divergence measurements based on phonological characteristics of speech correlate with physiological assessments of ALS. Speech-based assessments offer benefits over commonly-used physiological assessments in that they are inexpensive, non-intrusive, and do not require trained clinical personnel for administering and interpreting the results.
2014
pdf
The CMU Machine Translation Systems at WMT 2014
Austin Matthews
|
Waleed Ammar
|
Archna Bhatia
|
Weston Feely
|
Greg Hanneman
|
Eva Schlinger
|
Swabha Swayamdipta
|
Yulia Tsvetkov
|
Alon Lavie
|
Chris Dyer
Proceedings of the Ninth Workshop on Statistical Machine Translation
pdf
abs
A Corpus of Participant Roles in Contentious Discussions
Siddharth Jain
|
Archna Bhatia
|
Angelique Rein
|
Eduard Hovy
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
The expansion of social roles is, nowadays, a fact due to the ability of users to interact, discuss, exchange ideas and opinions, and form social networks though social media. Users in online social environment play a variety of social roles. The concept of “social role” has long been used in social science describe the intersection of behavioural, meaningful, and structural attributes that emerge regularly in particular settings. In this paper, we present a new corpus for social roles in online contentious discussions. We explore various behavioural attributes such as stubbornness, sensibility, influence, and ignorance to create a model of social roles to distinguish among various social roles participants assume in such setup. We annotate discussions drawn from two different sets of corpora in order to ensure that our model of social roles and their signals hold up in general. We discuss the various criteria for deciding values for each behavioural attributes which define the roles.
pdf
abs
Augmenting English Adjective Senses with Supersenses
Yulia Tsvetkov
|
Nathan Schneider
|
Dirk Hovy
|
Archna Bhatia
|
Manaal Faruqui
|
Chris Dyer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We develop a supersense taxonomy for adjectives, based on that of GermaNet, and apply it to English adjectives in WordNet using human annotation and supervised classification. Results show that accuracy for automatic adjective type classification is high, but synsets are considerably more difficult to classify, even for trained human annotators. We release the manually annotated data, the classifier, and the induced supersense labeling of 12,304 WordNet adjective synsets.
pdf
abs
A Unified Annotation Scheme for the Semantic/Pragmatic Components of Definiteness
Archna Bhatia
|
Mandy Simons
|
Lori Levin
|
Yulia Tsvetkov
|
Chris Dyer
|
Jordan Bender
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We present a definiteness annotation scheme that captures the semantic, pragmatic, and discourse information, which we call communicative functions, associated with linguistic descriptions such as “a story about my speech”, “the story”, “every time I give it”, “this slideshow”. A survey of the literature suggests that definiteness does not express a single communicative function but is a grammaticalization of many such functions, for example, identifiability, familiarity, uniqueness, specificity. Our annotation scheme unifies ideas from previous research on definiteness while attempting to remove redundancy and make it easily annotatable. This annotation scheme encodes the communicative functions of definiteness rather than the grammatical forms of definiteness. We assume that the communicative functions are largely maintained across languages while the grammaticalization of this information may vary. One of the final goals is to use our semantically annotated corpora to discover how definiteness is grammaticalized in different languages. We release our annotated corpora for English and Hindi, and sample annotations for Hebrew and Russian, together with an annotation manual.
pdf
Automatic Classification of Communicative Functions of Definiteness
Archna Bhatia
|
Chu-Cheng Lin
|
Nathan Schneider
|
Yulia Tsvetkov
|
Fatima Talib Al-Raisi
|
Laleh Roostapour
|
Jordan Bender
|
Abhimanu Kumar
|
Lori Levin
|
Mandy Simons
|
Chris Dyer
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers
pdf
A Dependency Parser for Tweets
Lingpeng Kong
|
Nathan Schneider
|
Swabha Swayamdipta
|
Archna Bhatia
|
Chris Dyer
|
Noah A. Smith
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
2013
pdf
Generating English Determiners in Phrase-Based Translation with Synthetic Translation Options
Yulia Tsvetkov
|
Chris Dyer
|
Lori Levin
|
Archna Bhatia
Proceedings of the Eighth Workshop on Statistical Machine Translation
2010
pdf
PropBank Annotation of Multilingual Light Verb Constructions
Jena D. Hwang
|
Archna Bhatia
|
Claire Bonial
|
Aous Mansouri
|
Ashwini Vaidya
|
Nianwen Xue
|
Martha Palmer
Proceedings of the Fourth Linguistic Annotation Workshop
pdf
abs
Empty Categories in a Hindi Treebank
Archna Bhatia
|
Rajesh Bhatt
|
Bhuvana Narasimhan
|
Martha Palmer
|
Owen Rambow
|
Dipti Misra Sharma
|
Michael Tepper
|
Ashwini Vaidya
|
Fei Xia
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
We are in the process of creating a multi-representational and multi-layered treebank for Hindi/Urdu (Palmer et al., 2009), which has three main layers: dependency structure, predicate-argument structure (PropBank), and phrase structure. This paper discusses an important issue in treebank design which is often neglected: the use of empty categories (ECs). All three levels of representation make use of ECs. We make a high-level distinction between two types of ECs, trace and silent, on the basis of whether they are postulated to mark displacement or not. Each type is further refined into several subtypes based on the underlying linguistic phenomena which the ECs are introduced to handle. This paper discusses the stages at which we add ECs to the Hindi/Urdu treebank and why. We investigate methodically the different types of ECs and their role in our syntactic and semantic representations. We also examine our decisions whether or not to coindex each type of ECs with other elements in the representation.