Khalil Sima’an

Also published as: K. Sima’an

2024

pdf abs
Continual Reinforcement Learning for Controlled Text Generation
Velizar Shulev | Khalil Sima’an
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Controlled Text Generation (CTG) steers the generation of continuations of a given context (prompt) by a Large Language Model (LLM) towards texts possessing a given attribute (e.g., topic, sentiment). In this paper we view CTG as a Continual Learning problem: how to learn at every step to steer next-word generation, without having to wait for end-of-sentence. This continual view is useful for online applications such as CTG for speech, where end-of-sentence is often uncertain. We depart from an existing model, the Plug-and-Play language models (PPLM), which perturbs the context at each step to better predict next-words that posses the desired attribute. While PPLM is intricate and has many hyper-parameters, we provide a proof that the PPLM objective function can be reduced to a Continual Reinforcement Learning (CRL) reward function, thereby simplifying PPLM and endowing it with a better understood learning framework. Subsequently, we present, the first of its kind, CTG algorithm that is fully based on CRL and exhibit promising empirical results.

2022

pdf abs
Passing Parser Uncertainty to the Transformer: Labeled Dependency Distributions for Neural Machine Translation
Dongqi Pu | Khalil Sima’an
Proceedings of the 23rd Annual Conference of the European Association for Machine Translation

Existing syntax-enriched neural machine translation (NMT) models work either with the single most-likely unlabeled parse or the set of n-best unlabeled parses coming out of an external parser. Passing a single or n-best parses to the NMT model risks propagating parse errors. Furthermore, unlabeled parses represent only syntactic groupings without their linguistically relevant categories. In this paper we explore the question: Does passing both parser uncertainty and labeled syntactic knowledge to the Transformer improve its translation performance? This paper contributes a novel method for infusing the whole labeled dependency distributions (LDD) of the source sentence’s dependency forest into the self-attention mechanism of the encoder of the Transformer. A range of experimental results on three language pairs demonstrate that the proposed approach outperforms both the vanilla Transformer as well as the single best-parse Transformer model across several evaluation metrics.

2018

pdf abs
Deep Generative Model for Joint Alignment and Word Representation
Miguel Rios | Wilker Aziz | Khalil Sima’an
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)

This work exploits translation data as a source of semantically relevant learning signal for models of word representation. In particular, we exploit equivalence through translation as a form of distributional context and jointly learn how to embed and align with a deep generative model. Our EmbedAlign model embeds words in their complete observed context and learns by marginalisation of latent lexical alignments. Besides, it embeds words as posterior probability densities, rather than point estimates, which allows us to compare words in context using a measure of overlap between distributions (e.g. KL divergence). We investigate our model’s performance on a range of lexical semantics tasks achieving competitive results on several standard benchmarks including natural language inference, paraphrasing, and text similarity.

2017

pdf abs
Alternative Objective Functions for Training MT Evaluation Metrics
Miloš Stanojević | Khalil Sima’an
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

MT evaluation metrics are tested for correlation with human judgments either at the sentence- or the corpus-level. Trained metrics ignore corpus-level judgments and are trained for high sentence-level correlation only. We show that training only for one objective (sentence or corpus level), can not only harm the performance on the other objective, but it can also be suboptimal for the objective being optimized. To this end we present a metric trained for corpus-level and show empirical comparison against a metric trained for sentence-level exemplifying how their performance may vary per language pair, type and level of judgment. Subsequently we propose a model trained to optimize both objectives simultaneously and show that it is far more stable than–and on average outperforms–both models on both objectives.

pdf abs
Graph Convolutional Encoders for Syntax-aware Neural Machine Translation
Jasmijn Bastings | Ivan Titov | Wilker Aziz | Diego Marcheggiani | Khalil Sima’an
Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

We present a simple and effective approach to incorporating syntactic structure into neural attention-based encoder-decoder models for machine translation. We rely on graph-convolutional networks (GCNs), a recent class of neural networks developed for modeling graph-structured data. Our GCNs use predicted syntactic dependency trees of source sentences to produce representations of words (i.e. hidden states of the encoder) that are sensitive to their syntactic neighborhoods. GCNs take word representations as input and produce word representations as output, so they can easily be incorporated as layers into standard encoders (e.g., on top of bidirectional RNNs or convolutional neural networks). We evaluate their effectiveness with English-German and English-Czech translation experiments for different types of encoders and observe substantial improvements over their syntax-agnostic versions in all the considered setups.

pdf
Elastic-substitution decoding for Hierarchical SMT: efficiency, richer search and double labels
Gideon Maillette de Buy Wenniger | Khalil Sima’an | Andy Way
Proceedings of Machine Translation Summit XVI: Research Track

2016

pdf
Word Alignment without NULL Words
Philip Schulz | Wilker Aziz | Khalil Sima’an
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Examining the Relationship between Preordering and Word Order Freedom in Machine Translation
Joachim Daiber | Miloš Stanojević | Wilker Aziz | Khalil Sima’an
Proceedings of the First Conference on Machine Translation: Volume 1, Research Papers

pdf
ILLC-UvA Adaptation System (Scorpio) at WMT’16 IT-DOMAIN Task
Hoang Cuong | Stella Frank | Khalil Sima’an
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description
Lucia Specia | Stella Frank | Khalil Sima’an | Desmond Elliott
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
Multi30K: Multilingual English-German Image Descriptions
Desmond Elliott | Stella Frank | Khalil Sima’an | Lucia Specia
Proceedings of the 5th Workshop on Vision and Language

pdf bib
Factoring Adjunction in Hierarchical Phrase-Based SMT
Sophie Arnoult | Khalil Sima’an
Proceedings of the 2nd Deep Machine Translation Workshop

pdf abs
Adapting to All Domains at Once: Rewarding Domain Invariance in SMT
Hoang Cuong | Khalil Sima’an | Ivan Titov
Transactions of the Association for Computational Linguistics, Volume 4

Existing work on domain adaptation for statistical machine translation has consistently assumed access to a small sample from the test distribution (target domain) at training time. In practice, however, the target domain may not be known at training time or it may change to match user needs. In such situations, it is natural to push the system to make safer choices, giving higher preference to domain-invariant translations, which work well across domains, over risky domain-specific alternatives. We encode this intuition by (1) inducing latent subdomains from the training data only; (2) introducing features which measure how specialized phrases are to individual induced sub-domains; (3) estimating feature weights on out-of-domain data (rather than on the target domain). We conduct experiments on three language pairs and a number of different domains. We observe consistent improvements over a baseline which does not explicitly reward domain invariance.

pdf abs
Hierarchical Permutation Complexity for Word Order Evaluation
Miloš Stanojević | Khalil Sima’an
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Existing approaches for evaluating word order in machine translation work with metrics computed directly over a permutation of word positions in system output relative to a reference translation. However, every permutation factorizes into a permutation tree (PET) built of primal permutations, i.e., atomic units that do not factorize any further. In this paper we explore the idea that permutations factorizing into (on average) shorter primal permutations should represent simpler ordering as well. Consequently, we contribute Permutation Complexity, a class of metrics over PETs and their extension to forests, and define tight metrics, a sub-class of metrics implementing this idea. Subsequently we define example tight metrics and empirically test them in word order evaluation. Experiments on the WMT13 data sets for ten language pairs show that a tight metric is more often than not better than the baselines.

pdf abs
Universal Reordering via Linguistic Typology
Joachim Daiber | Miloš Stanojević | Khalil Sima’an
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

In this paper we explore the novel idea of building a single universal reordering model from English to a large number of target languages. To build this model we exploit typological features of word order for a large number of target languages together with source (English) syntactic features and we train this model on a single combined parallel corpus representing all (22) involved language pairs. We contribute experimental evidence for the usefulness of linguistically defined typological features for building such a model. When the universal reordering model is used for preordering followed by monotone translation (no reordering inside the decoder), our experiments show that this pipeline gives comparable or improved translation performance with a phrase-based baseline for a large number of language pairs (12 out of 22) from diverse language families.

2015

pdf
Reordering Grammar Induction
Miloš Stanojević | Khalil Sima’an
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
BEER 1.1: ILLC UvA submission to metrics and tuning task
Miloš Stanojević | Khalil Sima’an
Proceedings of the Tenth Workshop on Statistical Machine Translation

pdf bib
Modelling the Adjunct/Argument Distinction in Hierarchical Phrase-Based SMT
Sophie Arnoult | Khalil Sima’an
Proceedings of the 1st Deep Machine Translation Workshop

pdf
Delimiting Morphosyntactic Search Space with Source-Side Reordering Models
Joachim Daiber | Khalil Sima’an
Proceedings of the 1st Deep Machine Translation Workshop

pdf
Latent Domain Word Alignment for Heterogeneous Corpora
Hoang Cuong | Khalil Sima’an
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
Machine translation with source-predicted target morphology
Joachim Daiber | Khalil Sima’an
Proceedings of Machine Translation Summit XV: Papers

2014

pdf abs
All Fragments Count in Parser Evaluation
Jasmijn Bastings | Khalil Sima’an
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

PARSEVAL, the default paradigm for evaluating constituency parsers, calculates parsing success (Precision/Recall) as a function of the number of matching labeled brackets across the test set. Nodes in constituency trees, however, are connected together to reflect important linguistic relations such as predicate-argument and direct-dominance relations between categories. In this paper, we present FREVAL, a generalization of PARSEVAL, where the precision and recall are calculated not only for individual brackets, but also for co-occurring, connected brackets (i.e. fragments). FREVAL fragments precision (FLP) and recall (FLR) interpolate the match across the whole spectrum of fragment sizes ranging from those consisting of individual nodes (labeled brackets) to those consisting of full parse trees. We provide evidence that FREVAL is informative for inspecting relative parser performance by comparing a range of existing parsers.

pdf
BEER: BEtter Evaluation as Ranking
Miloš Stanojević | Khalil Sima’an
Proceedings of the Ninth Workshop on Statistical Machine Translation

pdf bib
Bilingual Markov Reordering Labels for Hierarchical SMT
Gideon Maillette de Buy Wenniger | Khalil Sima’an
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf
Evaluating Word Order Recursively over Permutation-Forests
Miloš Stanojević | Khalil Sima’an
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf
How Synchronous are Adjuncts in Translation Data?
Sophie Arnoult | Khalil Sima’an
Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation

pdf
Latent Domain Translation Models in Mix-of-Domains Haystack
Hoang Cuong | Khalil Sima’an
Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

pdf
Fitting Sentence Level Translation Evaluation with Many Dense Features
Miloš Stanojević | Khalil Sima’an
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Latent Domain Phrase-based Models for Adaptation
Hoang Cuong | Khalil Sima’an
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Modern statistical parsers are trained on large annotated corpora (treebanks). These treebanks usually consist of sentences addressing different subdomains (e.g. sports, politics, music), which implies that the statistics gathered by current statistical parsers are mixtures of subdomains of language use. In this paper we present a method that exploits raw subdomain corpora gathered from the web to introduce subdomain sensitivity into a given parser. We employ statistical techniques for creating an ensemble of domain sensitive parsers, and explore methods for amalgamating their predictions. Our experiments show that introducing domain sensitivity by exploiting raw corpora can improve over a tough, state-of-the-art baseline.

pdf
Relational-Realizational Parsing
Reut Tsarfaty | Khalil Sima’an
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

2007

pdf
Smoothing a Lexicon-based POS Tagger for Arabic and Hebrew
Saib Manour | Khalil Sima’an | Yoad Winter
Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources

pdf
Three-Dimensional Parametrization for Parsing Morphologically Rich Languages
Reut Tsarfaty | Khalil Sima’an
Proceedings of the Tenth International Conference on Parsing Technologies

pdf
Supertagged Phrase-Based Statistical Machine Translation
Hany Hassan | Khalil Sima’an | Andy Way
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

2006

pdf abs
Corpus Variations for Translation Lexicon Induction
Rebecca Hwa | Carol Nichols | Khalil Sima’an
Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers

Lexical mappings (word translations) between languages are an invaluable resource for multilingual processing. While the problem of extracting lexical mappings from parallel corpora is well-studied, the task is more challenging when the language samples are from non-parallel corpora. The goal of this work is to investigate one such scenario: finding lexical mappings between dialects of a diglossic language, in which people conduct their written communications in a prestigious formal dialect, but they communicate verbally in a colloquial dialect. Because the two dialects serve different socio-linguistic functions, parallel corpora do not naturally exist between them. An example of a diglossic dialect pair is Modern Standard Arabic (MSA) and Levantine Arabic. In this paper, we evaluate the applicability of a standard algorithm for inducing lexical mappings between comparable corpora (Rapp, 1999) to such diglossic corpora pairs. The focus of the paper is an in-depth error analysis, exploring the notion of relatedness in diglossic corpora and scrutinizing the effects of various dimensions of relatedness (such as mode, topic, style, and statistics) on the quality of the resulting translation lexicon.

2005

pdf
Choosing an Optimal Architecture for Segmentation and POS-Tagging of Modern Hebrew
Roy Bar-Haim | Khalil Sima’an | Yoad Winter
Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages

2004

pdf
BioGrapher: Biography Questions as a Restricted Domain Question Answering Task
Oren Tsur | Maarten de Rijke | Khalil Sima’an
Proceedings of the Conference on Question Answering in Restricted Domains

2003

pdf abs
On maximizing metrics for syntactic disambiguation
Khalil Sima’an
Proceedings of the Eighth International Conference on Parsing Technologies

Given a probabilistic parsing model and an evaluation metric for scoring the match between parse-trees, e.g., PARSEVAL [Black et al., 1991], this paper addresses the problem of how to select the on average best scoring parse-tree for an input sentence. Common wisdom dictates that it is optimal to select the parse with the highest probability, regardless of the evaluation metric. In contrast, the Maximizing Metrics (MM) method [Goodman, 1998, Stolcke et al., 1997] proposes that an algorithm that optimizes the evaluation metric itself constitutes the optimal choice. We study the MM method within parsing. We observe that the MM does not always hold for tree-bank models, and that optimizing weak metrics is not interesting for semantic processing. Subsequently, we state an alternative proposition: the optimal algorithm must maximize the metric that scores parse-trees according to linguistically relevant features. We present new algorithms that optimize metrics that take into account increasingly more linguistic features, and exhibit experiments in support of our claim.