Helen Meng

Also published as: Helen M. Meng


Partner Personas Generation for Dialogue Response Generation
Hongyuan Lu | Wai Lam | Hong Cheng | Helen Meng
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Incorporating personas information allows diverse and engaging responses in dialogue response generation. Unfortunately, prior works have primarily focused on self personas and have overlooked the value of partner personas. Moreover, in practical applications, the availability of the gold partner personas is often not the case. This paper attempts to tackle these issues by offering a novel framework that leverages automatic partner personas generation to enhance the succeeding dialogue response generation. Our framework employs reinforcement learning with a dedicatedly designed critic network for reward judgement. Experimental results from automatic and human evaluations indicate that our framework is capable of generating relevant, interesting, coherent and informative partner personas, even compared to the ground truth partner personas. This enhances the succeeding dialogue response generation, which surpasses our competitive baselines that condition on the ground truth partner personas.

Toward Self-Learning End-to-End Task-oriented Dialog Systems
Xiaoying Zhang | Baolin Peng | Jianfeng Gao | Helen Meng
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue

End-to-end task bots are typically learned over a static and usually limited-size corpus. However, when deployed in dynamic, changing, and open environments to interact with users, task bots tend to fail when confronted with data that deviate from the training corpus, i.e., out-of-distribution samples. In this paper, we study the problem of automatically adapting task bots to changing environments by learning from human-bot interactions with minimum or zero human annotations. We propose SL-Agent, a novel self-learning framework for building end-to-end task bots. SL-Agent consists of a dialog model and a pre-trained reward model to predict the quality of an agent response. It enables task bots to automatically adapt to changing environments by learning from the unlabeled human-bot dialog logs accumulated after deployment via reinforcement learning with the incorporated reward model. Experimental results on four well-studied dialog tasks show the effectiveness of SL-Agent to automatically adapt to changing environments, using both automatic and human evaluations. We will release code and data for further research.

On Controlling Fallback Responses for Grounded Dialogue Generation
Hongyuan Lu | Wai Lam | Hong Cheng | Helen Meng
Findings of the Association for Computational Linguistics: ACL 2022

Dialogue agents can leverage external textual knowledge to generate responses of a higher quality. To our best knowledge, most existing works on knowledge grounded dialogue settings assume that the user intention is always answerable. Unfortunately, this is impractical as there is no guarantee that the knowledge retrievers could always retrieve the desired knowledge. Therefore, this is crucial to incorporate fallback responses to respond to unanswerable contexts appropriately while responding to the answerable contexts in an informative manner. We propose a novel framework that automatically generates a control token with the generator to bias the succeeding response towards informativeness for answerable contexts and fallback for unanswerable contexts in an end-to-end manner. Since no existing knowledge grounded dialogue dataset considers this aim, we augment the existing dataset with unanswerable contexts to conduct our experiments. Automatic and human evaluation results indicate that naively incorporating fallback responses with controlled text generation still hurts informativeness for answerable context. In contrast, our proposed framework effectively mitigates this problem while still appropriately presenting fallback responses to unanswerable contexts. Such a framework also reduces the extra burden of the additional classifier and the overheads introduced in the previous works, which operates in a pipeline manner.

Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark
Jingyan Zhou | Jiawen Deng | Fei Mi | Yitong Li | Yasheng Wang | Minlie Huang | Xin Jiang | Qun Liu | Helen Meng
Findings of the Association for Computational Linguistics: EMNLP 2022

Among all the safety concerns that hinder the deployment of open-domain dialog systems (e.g., offensive languages, biases, and toxic behaviors), social bias presents an insidious challenge. Addressing this challenge requires rigorous analyses and normative reasoning. In this paper, we focus our investigation on social bias measurement to facilitate the development of unbiased dialog systems. We first propose a novel Dial-Bias Framework for analyzing the social bias in conversations using a holistic method beyond bias lexicons or dichotomous annotations. Leveraging the proposed framework, we further introduce the CDial-Bias Dataset which is, to the best of our knowledge, the first annotated Chinese social bias dialog dataset. We also establish a fine-grained dialog bias measurement benchmark and conduct in-depth ablation studies to shed light on the utility of the detailed annotations in the proposed dataset. Finally, we evaluate representative Chinese generative models with our classifiers to unveil the presence of social bias in these systems.

COLD: A Benchmark for Chinese Offensive Language Detection
Jiawen Deng | Jingyan Zhou | Hao Sun | Chujie Zheng | Fei Mi | Helen Meng | Minlie Huang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Offensive language detection is increasingly crucial for maintaining a civilized social media platform and deploying pre-trained language models. However, this task in Chinese is still under exploration due to the scarcity of reliable datasets. To this end, we propose a benchmark –COLD for Chinese offensive language analysis, including a Chinese Offensive Language Dataset –COLDATASET and a baseline detector –COLDETECTOR which is trained on the dataset. We show that the COLD benchmark contributes to Chinese offensive language detection which is challenging for existing resources. We then deploy the COLDETECTOR and conduct detailed analyses on popular Chinese pre-trained language models. We first analyze the offensiveness of existing generative models and show that these models inevitably expose varying degrees of offensive issues. Furthermore, we investigate the factors that influence the offensive generations, and we find that anti-bias contents and keywords referring to certain groups or revealing negative attitudes trigger offensive outputs easier.

Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout
Kun Li | Tianhua Zhang | Liping Tang | Junan Li | Hongyuan Lu | Xixin Wu | Helen Meng
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering

MultiDoc2Dial presents an important challenge on modeling dialogues grounded with multiple documents. This paper proposes a pipeline system of “retrieve, re-rank, and generate”, where each component is individually optimized. This enables the passage re-ranker and response generator to fully exploit training with ground-truth data. Furthermore, we use a deep cross-encoder trained with localized hard negative passages from the retriever. For the response generator, we use grounding span prediction as an auxiliary task to be jointly trained with the main task of response generation. We also adopt a passage dropout and regularization technique to improve response generation performance. Experimental results indicate that the system clearly surpasses the competitive baseline and our team CPII-NLP ranked 1st among the public submissions on ALL four leaderboards based on the sum of F1, SacreBLEU, METEOR and RougeL scores.

Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis
Xueyuan Chen | Shun Lei | Zhiyong Wu | Dong Xu | Weifeng Zhao | Helen Meng
Proceedings of the 29th International Conference on Computational Linguistics

Naturalness and expressiveness are crucial for audiobook speech synthesis, but now are limited by the averaged global-scale speaking style representation. In this paper, we propose an unsupervised multi-scale context-sensitive text-to-speech model for audiobooks. A multi-scale hierarchical context encoder is specially designed to predict both global-scale context style embedding and local-scale context style embedding from a wider context of input text in a hierarchical manner. Likewise, a multi-scale reference encoder is introduced to extract reference style embeddings at both global and local scales from the reference speech, which is used to guide the prediction of speaking styles. On top of these, a bi-reference attention mechanism is used to align both local-scale reference style embedding sequence and local-scale context style embedding sequence with corresponding phoneme embedding sequence. Both objective and subjective experiment results on a real-world multi-speaker Mandarin novel audio dataset demonstrate the excellent performance of our proposed method over all baselines in terms of naturalness and expressiveness of the synthesized speech.


Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings
Pengfei Liu | Shafiq Joty | Helen Meng
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

Analysis of Dysarthric Speech using Distinctive Feature Recognition
Ka Ho Wong | Yu Ting Yeung | Patrick C. M. Wong | Gina-Anne Levow | Helen Meng
Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies


SeemGo: Conditional Random Fields Labeling and Maximum Entropy Classification for Aspect Based Sentiment Analysis
Pengfei Liu | Helen Meng
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014)


Automatic Story Segmentation using a Bayesian Decision Framework for Statistical Models of Lexical Chain Features
Wai-Kit Lo | Wenying Xiong | Helen Meng
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Developing Speech Recognition and Synthesis Technologies to Support Computer-Aided Pronunciation Training for Chinese Learners of English
Helen Meng
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 1


Combined Use of Speaker- and Tone-Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast News
Lei Xie | Chuan Liu | Helen Meng
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers


A Maximum Entropy Framework that Integrates Word Dependencies and Grammatical Relations for Reading Comprehension
Kui Xu | Helen Meng | Fuliang Weng
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers


Design and Development of a Bilingual Reading Comprehension Corpus
Kui Xu | Helen Meng
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 2, June 2005: Special Issue on Annotated Speech Corpora

The Use of Metadata, Web-derived Answer Patterns and Passage Context to Improve Reading Comprehension Performance
Yongping Du | Helen Meng | Xuanjing Huang | Lide Wu
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing


Automatic Grammar Partitioning for Syntactic Parsing
Po Chui Luk | Fuliang Weng | Helen Meng
Proceedings of the Seventh International Workshop on Parsing Technologies

Design, Compilation and Processing of CUCall: A Set of Cantonese Spoken Language Corpora Collected Over Telephone Networks
W.K. Lo | P.C. Ching | Tan Lee | Helen Meng
Proceedings of Research on Computational Linguistics Conference XIV

Mandarin-English Information: Investigating Translingual Speech Retrieval
Helen Meng | Berlin Chen | Sanjeev Khudanpur | Gina-Anne Levow | Wai-Kit Lo | Douglas Oard | Patrick Shone | Karen Tang | Hsin-Min Wang | Jianqiang Wang
Proceedings of the First International Conference on Human Language Technology Research

Scalability and Portability of a Belief Network-based Dialog Model for Different Application Domains
Carmen Wai | Helen M. Meng | Roberto Pieraccini
Proceedings of the First International Conference on Human Language Technology Research


Parsing a Lattice with Multiple Grammars
Fuliang Weng | Helen Meng | Po Chui Luk
Proceedings of the Sixth International Workshop on Parsing Technologies

Efficiency, memory, ambiguity, robustness and scalability are the central issues in natural language parsing. Because of the complexity of natural language, different parsers may be suited only to certain subgrammars. In addition, grammar maintenance and updating may have adverse effects on tuned parsers. Motivated by these concerns, [25] proposed a grammar partitioning and top-down parser composition mechanism for loosely restricted Context-Free Grammars (CFGs). In this paper, we report on significant progress, i.e., (1) developing guidelines for the grammar partition through a set of heuristics, (2) devising a new mix-strategy composition algorithms for any rule-based grammar partition in a lattice framework, and 3) initial but encouraging parsing results for Chinese and English queries from an Air Travel Information System (ATIS) corpus.

Mandarin-English Information (MEI): Investigating Translingual Speech Retrieval
Helen Meng | Sanjeev Khudanpur | Gina Levow | Douglas W. Oard | Hsin-Min Wang
ANLP-NAACL 2000 Workshop: Embedded Machine Translation Systems


pdf bib
An Analytical Study of Transformational Tagging for Chinese Text
Helen Meng | Chun Wah Ip
ROCLING 1999 Short Papers


Phonological Parsing for Bi-directional Letter-to-Sound/Sound-to-Letter Generation
Helen M. Meng | Stephanie Seneff | Victor W. Zue
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994


Signal Representation Attribute Extraction and the Use Distinctive Features for Phonetic Classification
Helen M. Meng | Victor W. Zue | Hong C. Leung
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991