2024
pdf
bib
abs
A Typology of Errors for User Utterances in Chatbots
Anu Singh
|
Esme Manandise
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
This paper discusses the challenges non-prescriptive language uses in chatbot communication create for Semantic Parsing (SP). To help SP developers improve their systems, we propose a flexible error typology based on an analysis of a sample of non-prescriptive language uses mined from a domain-specific chatbot logs. This typology is not tied to any specific language model. We also present a framework for automatically mapping these errors to the typology. Finally, we show how our framework can help evaluate SP systems from a linguistic robustness perspective. Our framework can be expanded to include new classes of errors across different domains and user demographics.
2020
pdf
bib
abs
Mitigating Silence in Compliance Terminology during Parsing of Utterances
Esme Manandise
|
Conrad de Peuter
Proceedings of the 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation
This paper reports on an approach to increase multi-token-term recall in a parsing task. We use a compliance-domain parser to extract, during the process of parsing raw text, terms that are unlisted in the terminology. The parser uses a similarity measure (Generalized Dice Coefficient) between listed terms and unlisted term candidates to (i) determine term status, (ii) serve putative terms to the parser, (iii) decrease parsing complexity by glomming multi-tokens as lexical singletons, and (iv) automatically augment the terminology after parsing of an utterance completes. We illustrate a small experiment with examples from the tax-and-regulations domain. Bootstrapping the parsing process to detect out- of-vocabulary terms at runtime increases parsing accuracy in addition to producing other benefits to a natural-language-processing pipeline, which translates arithmetic calculations written in English into computer-executable operations.
2019
pdf
bib
Towards Unlocking the Narrative of the United States Income Tax Forms
Esme Manandise
Proceedings of the Second Financial Narrative Processing Workshop (FNP 2019)
2015
pdf
bib
The Bare Necessities: Increasing Lexical Coverage for Multi-Word Domain Terms with Less Lexical Data
Branimir Boguraev
|
Esme Manandise
|
Benjamin Segal
Proceedings of the 11th Workshop on Multiword Expressions
2002
pdf
bib
abs
Using word formation rules to extend MT lexicons
Claudia Gdaniec
|
Esmé Manandise
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Technical Papers
In the IBM LMT Machine Translation (MT) system, a built-in strategy provides lexical coverage of a particular subset of words that are not listed in its bilingual lexicons. The recognition and coding of these words and their transfer generation is based on a set of derivational morphological rules. A new utility extends unfound words of this type in an LMT-compatible format in an auxiliary bilingual lexical file to be subsequently merged into the core lexicons. What characterizes this approach is the use of morphological, semantic, and syntactic features for both analysis and transfer. The auxiliary lexical file (ALF) has to be revised before a merge into the core lexicons. This utility integrates a linguistics-based analysis and transfer rules with a corpus-based method of verifying or falsifying linguistic hypotheses against extensive document translation, which in addition yields statistics on frequencies of occurrence as well as local context.
2001
pdf
bib
abs
Derivational morphology to the rescue: how it can help resolve unfound words in MT
Claudia Gdaniec
|
Esmé Manandise
|
Michael C. McCord
Proceedings of Machine Translation Summit VIII
Machine Translation (MT) systems that process unrestricted text should be able to deal with words that are not found in the MT lexicon. Without some kind of recognition, the parse may be incomplete, there is no transfer for the unfound word, and tests for transfers for surrounding words will often fail, resulting in poor translation. Interestingly, not much has been published on unfound- word guessing in the context of MT although such work has been going on for other applications. In our work on the IBM MT system, we implemented a far-reaching strategy for recognizing unfound words based on rules of word formation and for generating transfers. What distinguishes our approach from others is the use of semantic and syntactic features for both analysis and transfer, a scoring system to assign levels of confidence to possible word structures, and the creation of transfers in the transformation component. We also successfully applied rules of derivational morphological analysis to non-derived unfound words.
1989
pdf
bib
Book Reviews: New Directions in Machine Translation (Proceedings of the Conference, Budapest, August 1988)
Esmeralda Manandise
Computational Linguistics, Volume 15, Number 4, December 1989