Ying Zhang

Also published as: Joy Ying Zhang


PM2F2N: Patient Multi-view Multi-modal Feature Fusion Networks for Clinical Outcome Prediction
Ying Zhang | Baohang Zhou | Kehui Song | Xuhui Sui | Guoqing Zhao | Ning Jiang | Xiaojie Yuan
Findings of the Association for Computational Linguistics: EMNLP 2022

Clinical outcome prediction is critical to the condition prediction of patients and management of hospital capacities. There are two kinds of medical data, including time series signals recorded by various devices and clinical notes in electronic health records (EHR), which are used for two common prediction targets: mortality and length of stay. Traditional methods focused on utilizing time series data but ignored clinical notes. With the development of deep learning, natural language processing (NLP) and multi-modal learning methods are exploited to jointly model the time series and clinical notes with different modals. However, the existing methods failed to fuse the multi-modal features of patients from different views. Therefore, we propose the patient multi-view multi-modal feature fusion networks for clinical outcome prediction. Firstly, from patient inner view, we propose to utilize the co-attention module to enhance the fine-grained feature interaction between time series and clinical notes from each patient. Secondly, the patient outer view is the correlation between patients, which can be reflected by the structural knowledge in clinical notes. We exploit the structural information extracted from clinical notes to construct the patient correlation graph, and fuse patients’ multi-modal features by graph neural networks (GNN). The experimental results on MIMIC-III benchmark demonstrate the superiority of our method.

Improving Zero-Shot Entity Linking Candidate Generation with Ultra-Fine Entity Type Information
Xuhui Sui | Ying Zhang | Kehui Song | Baohang Zhou | Guoqing Zhao | Xin Wei | Xiaojie Yuan
Proceedings of the 29th International Conference on Computational Linguistics

Entity linking, which aims at aligning ambiguous entity mentions to their referent entities in a knowledge base, plays a key role in multiple natural language processing tasks. Recently, zero-shot entity linking task has become a research hotspot, which links mentions to unseen entities to challenge the generalization ability. For this task, the training set and test set are from different domains, and thus entity linking models tend to be overfitting due to the tendency of memorizing the properties of entities that appear frequently in the training set. We argue that general ultra-fine-grained type information can help the linking models to learn contextual commonality and improve their generalization ability to tackle the overfitting problem. However, in the zero-shot entity linking setting, any type information is not available and entities are only identified by textual descriptions. Thus, we first extract the ultra-fine entity type information from the entity textual descriptions. Then, we propose a hierarchical multi-task model to improve the high-level zero-shot entity linking candidate generation task by utilizing the entity typing task as an auxiliary low-level task, which introduces extracted ultra-fine type information into the candidate generation task. Experimental results demonstrate the effectiveness of utilizing the ultra-fine entity type information and our proposed method achieves state-of-the-art performance.

Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances
Yike Wu | Yu Zhao | Shiwan Zhao | Ying Zhang | Xiaojie Yuan | Guoqing Zhao | Ning Jiang
Proceedings of the 29th International Conference on Computational Linguistics

Despite the great progress of Visual Question Answering (VQA), current VQA models heavily rely on the superficial correlation between the question type and its corresponding frequent answers (i.e., language priors) to make predictions, without really understanding the input. In this work, we define the training instances with the same question type but different answers as superficially similar instances, and attribute the language priors to the confusion of VQA model on such instances. To solve this problem, we propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances. Specifically, for each training instance, we first construct a set that contains its superficially similar counterparts. Then we exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space. In this way, the VQA model is forced to further focus on the other parts of the input beyond the question type, which helps to overcome the language priors. Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2. Codes are available at Distinguishing-VQA.

A Span-based Multimodal Variational Autoencoder for Semi-supervised Multimodal Named Entity Recognition
Baohang Zhou | Ying Zhang | Kehui Song | Wenya Guo | Guoqing Zhao | Hongbin Wang | Xiaojie Yuan
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Multimodal named entity recognition (MNER) on social media is a challenging task which aims to extract named entities in free text and incorporate images to classify them into user-defined types. However, the annotation for named entities on social media demands a mount of human efforts. The existing semi-supervised named entity recognition methods focus on the text modal and are utilized to reduce labeling costs in traditional NER. However, the previous methods are not efficient for semi-supervised MNER. Because the MNER task is defined to combine the text information with image one and needs to consider the mismatch between the posted text and image. To fuse the text and image features for MNER effectively under semi-supervised setting, we propose a novel span-based multimodal variational autoencoder (SMVAE) model for semi-supervised MNER. The proposed method exploits modal-specific VAEs to model text and image latent features, and utilizes product-of-experts to acquire multimodal features. In our approach, the implicit relations between labels and multimodal features are modeled by multimodal VAE. Thus, the useful information of unlabeled data can be exploited in our method under semi-supervised setting. Experimental results on two benchmark datasets demonstrate that our approach not only outperforms baselines under supervised setting, but also improves MNER performance with less labeled data than existing semi-supervised methods.

MoSE: Modality Split and Ensemble for Multimodal Knowledge Graph Completion
Yu Zhao | Xiangrui Cai | Yike Wu | Haiwei Zhang | Ying Zhang | Guoqing Zhao | Ning Jiang
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Multimodal knowledge graph completion (MKGC) aims to predict missing entities in MKGs. Previous works usually share relation representation across modalities. This results in mutual interference between modalities during training, since for a pair of entities, the relation from one modality probably contradicts that from another modality. Furthermore, making a unified prediction based on the shared relation representation treats the input in different modalities equally, while their importance to the MKGC task should be different. In this paper, we propose MoSE, a Modality Split representation learning and Ensemble inference framework for MKGC. Specifically, in the training phase, we learn modality-split relation embeddings for each modality instead of a single modality-shared one, which alleviates the modality interference. Based on these embeddings, in the inference phase, we first make modality-split predictions and then exploit various ensemble methods to combine the predictions with different weights, which models the modality importance dynamically. Experimental results on three KG datasets show that MoSE outperforms state-of-the-art MKGC methods. Codes are available at https://github.com/OreOZhao/MoSE4MKGC.


An End-to-End Progressive Multi-Task Learning Framework for Medical Named Entity Recognition and Normalization
Baohang Zhou | Xiangrui Cai | Ying Zhang | Xiaojie Yuan
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

Medical named entity recognition (NER) and normalization (NEN) are fundamental for constructing knowledge graphs and building QA systems. Existing implementations for medical NER and NEN are suffered from the error propagation between the two tasks. The mispredicted mentions from NER will directly influence the results of NEN. Therefore, the NER module is the bottleneck of the whole system. Besides, the learnable features for both tasks are beneficial to improving the model performance. To avoid the disadvantages of existing models and exploit the generalized representation across the two tasks, we design an end-to-end progressive multi-task learning model for jointly modeling medical NER and NEN in an effective way. There are three level tasks with progressive difficulty in the framework. The progressive tasks can reduce the error propagation with the incremental task settings which implies the lower level tasks gain the supervised signals other than errors from the higher level tasks to improve their performances. Besides, the context features are exploited to enrich the semantic information of entity mentions extracted by NER. The performance of NEN profits from the enhanced entity mention features. The standard entities from knowledge bases are introduced into the NER module for extracting corresponding entity mentions correctly. The empirical results on two publicly available medical literature datasets demonstrate the superiority of our method over nine typical methods.

Target-oriented Fine-tuning for Zero-Resource Named Entity Recognition
Ying Zhang | Fandong Meng | Yufeng Chen | Jinan Xu | Jie Zhou
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

Generic Mechanism for Reducing Repetitions in Encoder-Decoder Models
Ying Zhang | Hidetaka Kamigaito | Tatsuya Aoki | Hiroya Takamura | Manabu Okumura
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)

Encoder-decoder models have been commonly used for many tasks such as machine translation and response generation. As previous research reported, these models suffer from generating redundant repetition. In this research, we propose a new mechanism for encoder-decoder models that estimates the semantic difference of a source sentence before and after being fed into the encoder-decoder model to capture the consistency between two sides. This mechanism helps reduce repeatedly generated tokens for a variety of tasks. Evaluation results on publicly available machine translation and response generation datasets demonstrate the effectiveness of our proposal.

A Language Model-based Generative Classifier for Sentence-level Discourse Parsing
Ying Zhang | Hidetaka Kamigaito | Manabu Okumura
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Discourse segmentation and sentence-level discourse parsing play important roles for various NLP tasks to consider textual coherence. Despite recent achievements in both tasks, there is still room for improvement due to the scarcity of labeled data. To solve the problem, we propose a language model-based generative classifier (LMGC) for using more information from labels by treating the labels as an input while enhancing label representations by embedding descriptions for each label. Moreover, since this enables LMGC to make ready the representations for labels, unseen in the pre-training step, we can effectively use a pre-trained language model in LMGC. Experimental results on the RST-DT dataset show that our LMGC achieved the state-of-the-art F1 score of 96.72 in discourse segmentation. It further achieved the state-of-the-art relation F1 scores of 84.69 with gold EDU boundaries and 81.18 with automatically segmented boundaries, respectively, in sentence-level discourse parsing.


Jibiki-LINKS: a tool between traditional dictionaries and lexical networks for modelling lexical resources
Ying Zhang | Mathieu Mangeot | Valérie Bellynck | Christian Boitet
Proceedings of the 4th Workshop on Cognitive Aspects of the Lexicon (CogALex)


Measuring the Structural Importance through Rhetorical Structure Index
Narine Kokhlikyan | Alex Waibel | Yuqi Zhang | Joy Ying Zhang
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Complex terminologies management - the case of acronyms (Gestion des terminologies riches : l’exemple des acronymes) [in French]
Ying Zhang | Mathieu Mangeot
Proceedings of TALN 2013 (Volume 2: Short Papers)

iMAG : MT-postediting, translation quality evaluation and parallel corpus production (iMAG : post-édition, évaluation de qualité de TA et production d’un corpus parallèle) [in French]
Lingxiao Wang | Ying Zhang
Proceedings of TALN 2013 (Volume 3: System Demonstrations)


Machine Translation with Binary Feedback: a Large-Margin Approach
Avneesh Saluja | Ian Lane | Ying Zhang
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

Viewing machine translation as a structured classification problem has provided a gateway for a host of structured prediction techniques to enter the field. In particular, large-margin structured prediction methods for discriminative training of feature weights, such as the structured perceptron or MIRA, have started to match or exceed the performance of existing methods such as MERT. One issue with structured problems in general is the difficulty in obtaining fully structured labels, e.g., in machine translation, obtaining reference translations or parallel sentence corpora for arbitrary language pairs. Another issue, more specific to the translation domain, is the difficulty in online training of machine translation systems, since existing methods often require bilingual knowledge to correct translation output online. We propose a solution to these two problems, by demonstrating a way to incorporate binary-labeled feedback (i.e., feedback on whether a translation hypothesis is a “good” or understandable one or not), a form of supervision that can be easily integrated in an online manner, into a machine translation framework. Experimental results show marked improvement by incorporating binary feedback on unseen test data, with gains exceeding 5.5 BLEU points.

pdf bib
Integrating MT with Digital Collections for Multilingual Information Access
Jiangping Chen | Olajumoke Agozu | Wenqian Zhao | Cheng Chieh Lien | Ryan Knudson | Ying Zhang
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Commercial MT User Program

This paper describes the role of machine translation (MT) for multilingual information access, a service that is desired by digital libraries that wish to provide cross-cultural access to their collections. To understand the performance of MT, we have developed HeMT: an integrated multilingual evaluation platform (http://txcdk-v10.unt.edu/HeMT/) to facilitate human evaluation of machine translation. The results of human evaluation using HeMT on three online MT services are reported. Challenges and benefits of crowdsourcing and collaboration based on our experience are discussed. Additionally, we present the analysis of the translation errors and propose Multi-engine MT strategies to improve translation performance.

Demo of iMAG Possibilities: MT-postediting, Translation Quality Evaluation, Parallel Corpus Production
Ling Xiao Wang | Ying Zhang | Christian Boitet | Valerie Bellynck
Proceedings of COLING 2012: Demonstration Papers


Context-aware Language Modeling for Conversational Speech Translation
Avneesh Saluja | Ian Lane | Ying Zhang
Proceedings of Machine Translation Summit XIII: Papers


A Language Approach to Modeling Human Behaviors
Peng-Wen Chen | Snehal Kumar Chennuru | Ying Zhang
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The modeling of human behavior becomes more and more important due to the increasing popularity of context-aware computing and people-centric mobile applications. Inspired by the principle of action-as-language, we propose that human ambulatory behavior shares similar properties as natural languages. In addition, by exploiting this similarity, we will be able to index, recognize, cluster, retrieve, and infer high-level semantic meanings of human behaviors via the use of natural language processing techniques. In this paper, we developed a Life Logger system to help build the behavior language corpus which supports our ""Behavior as Language"" research. The constructed behavior corpus shows Zipf's distribution over the frequency of vocabularies which is aligned with our ""Behavior as Language"" assumption. Our preliminary results of using smoothed n-gram language model for activity recognition achieved an average accuracy rate of 94% in distinguishing among human ambulatory behaviors including walking, running, and cycling. This behavior-as-language corpus will enable researchers to study higher level human behavior based on the syntactic and semantic analysis of the corpus data.


Virtual Babel: Towards Context-Aware Machine Translation in Virtual Worlds
Ying Zhang | Nguyen Bach
Proceedings of Machine Translation Summit XII: Posters


Domain-Specific Query Translation for Multilingual Information Access using Machine Translation Augmented With Dictionaries Mined from Wikipedia
Gareth Jones | Fabio Fantino | Eamonn Newman | Ying Zhang
Proceedings of the 2nd workshop on Cross Lingual Information Access (CLIA) Addressing the Information Need of Multilingual Societies


Multilingual Search for Cultural Heritage Archives via Combining Multiple Translation Resources
Gareth J. F. Jones | Ying Zhang | Eamonn Newman | Fabio Fantino | Franca Debole
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

Enhancing image-based Arabic document translation using noisy channel correction model
Yi Chang | Ying Zhang | Stephan Vogel | Jie Yang
Proceedings of Machine Translation Summit XI: Papers

PanDoRA: a large-scale two-way statistical machine translation system for hand-held devices
Ying Zhang | Stephan Vogel
Proceedings of Machine Translation Summit XI: Papers

The CMU-UKA statistical machine translation systems for IWSLT 2007
Ian Lane | Andreas Zollmann | Thuy Linh Nguyen | Nguyen Bach | Ashish Venugopal | Stephan Vogel | Kay Rottmann | Ying Zhang | Alex Waibel
Proceedings of the Fourth International Workshop on Spoken Language Translation

This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese→English, Chinese→English and Arabic→English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled. For Japanese→English we focused on two problems: first, punctuation recovery, and second, how to incorporate topic-knowledge into the translation framework. Our Chinese→English submission focused on syntax-augmented SMT and for the Arabic→English task we focused on incorporating morphological-decomposition into the SMT framework. This research strategy enabled us to evaluate a wide variety of approaches which proved effective for the language pairs they were evaluated on.


Distributed Language Modeling for N-best List Re-ranking
Ying Zhang | Almut Silja Hildebrand | Stephan Vogel
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing


Competitive Grouping in Integrated Phrase Segmentation and Alignment Model
Ying Zhang | Stephan Vogel
Proceedings of the ACL Workshop on Building and Using Parallel Texts

An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora
Ying Zhang | Stephan Vogel
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

Mining Key Phrase Translations from Web Corpora
Fei Huang | Ying Zhang | Stephan Vogel
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing


Interpreting BLEU/NIST Scores: How Much Improvement do We Need to Have a Better System?
Ying Zhang | Stephan Vogel | Alex Waibel
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

Measuring confidence intervals for the machine translation evaluation metrics
Ying Zhang | Stephan Vogel
Proceedings of the 10th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages


The CMU statistical machine translation system
Stephan Vogel | Ying Zhang | Fei Huang | Alicia Tribble | Ashish Venugopal | Bing Zhao | Alex Waibel
Proceedings of Machine Translation Summit IX: Papers

In this paper we describe the components of our statistical machine translation system. This system combines phrase-to-phrase translations extracted from a bilingual corpus using different alignment approaches. Special methods to extract and align named entities are used. We show how a manual lexicon can be incorporated into the statistical system in an optimized way. Experiments on Chinese-to-English and Arabic-to-English translation tasks are presented.


Pre-processing of bilingual corpora for Mandarin-English EBMT
Ying Zhang | Ralf Brown | Robert Frederking | Alon Lavie
Proceedings of Machine Translation Summit VIII

Pre-processing of bilingual corpora plays an important role in Example-Based Machine Translation (EBMT) and Statistical-Based Machine Translation (SBMT). For our Mandarin-English EBMT system, pre-processing includes segmentation for Mandarin, bracketing for English and building a statistical dictionary from the corpora. We used the Mandarin segmenter from the Linguistic Data Consortium (LDC). It uses dynamic programming with a frequency dictionary to segment the text. Although the frequency dictionary is large, it does not completely cover the corpora. In this paper, we describe the work we have done to improve the segmentation for Mandarin and the bracketing process for English to increase the length of English phrases. A statistical dictionary is built from the aligned bilingual corpus. It is used as feedback to segmentation and bracketing to re-segment / re-bracket the corpus. The process iterates several times to achieve better results. The final results of the corpus pre-processing are a segmented/bracketed aligned bilingual corpus and a statistical dictionary. We achieved positive results by increasing the average length of Chinese terms about 60% and 10% for English. The statistical dictionary gained about a 30% increase in coverage.

pdf bib
Adapting an Example-Based Translation System to Chinese
Ying Zhang | Ralf D. Brown | Robert E. Frederking
Proceedings of the First International Conference on Human Language Technology Research

Towards Automatic Sign Translation
Jie Yang | Jiang Gao | Ying Zhang | Alex Waibel
Proceedings of the First International Conference on Human Language Technology Research