Yuval Marton


2022

pdf
Where’s the Learning in Representation Learning for Compositional Semantics and the Case of Thematic Fit
Mughilan Muthupari | Samrat Halder | Asad Sayeed | Yuval Marton
Proceedings of the Fifth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

Observing that for certain NLP tasks, such as semantic role prediction or thematic fit estimation, random embeddings perform as well as pre-trained embeddings, we explore what settings allow for this, and examine where most of the learning is encoded: the word embeddings, the semantic role embeddings, or “the network”. We find nuanced answers, depending on the task and its relation to the training objective. We examine these representation learning aspects in multi-task learning, where role prediction and role-filling are supervised tasks, while several thematic fit tasks are outside the models’ direct supervision. We observe a non-monotonous relation between some tasks’ quality scores and the training data size. In order to better understand this observation, we analyze these results using easier, per-verb versions of these tasks.

pdf
Thematic Fit Bits: Annotation Quality and Quantity Interplay for Event Participant Representation
Yuval Marton | Asad Sayeed
Proceedings of the Thirteenth Language Resources and Evaluation Conference

Modeling thematic fit (a verb-argument compositional semantics task) currently requires a very large burden of labeled data. We take a linguistically machine-annotated large corpus and replace corpus layers with output from higher-quality, more modern taggers. We compare the old and new corpus versions’ impact on a verb-argument fit modeling task, using a high-performing neural approach. We discover that higher annotation quality dramatically reduces our data requirement while demonstrating better supervised predicate-argument classification. But in applying the model to psycholinguistic tasks outside the training objective, we see clear gains at scale, but only in one of two thematic fit estimation tasks, and no clear gains on the other. We also see that quality improves with training size, but perhaps plateauing or even declining in one task. Last, we tested the effect of role set size. All this suggests that the quality/quantity interplay is not all you need. We replicate previous studies while modifying certain role representation details and set a new state-of-the-art in event modeling, using a fraction of the data. We make the new corpus version public.

2016

pdf
E-TIPSY: Search Query Corpus Annotated with Entities, Term Importance, POS Tags, and Syntactic Parses
Yuval Marton | Kristina Toutanova
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present E-TIPSY, a search query corpus annotated with named Entities, Term Importance, POS tags, and SYntactic parses. This corpus contains crowdsourced (gold) annotations of the three most important terms in each query. In addition, it contains automatically produced annotations of named entities, part-of-speech tags, and syntactic parses for the same queries. This corpus comes in two formats: (1) Sober Subset: annotations that two or more crowd workers agreed upon, and (2) Full Glass: all annotations. We analyze the strikingly low correlation between term importance and syntactic headedness, which invites research into effective ways of combining these different signals. Our corpus can serve as a benchmark for term importance methods aimed at improving search engine quality and as an initial step toward developing a dataset of gold linguistic analysis of web search queries. In addition, it can be used as a basis for linguistic inquiries into the kind of expressions used in search.

2014

pdf
A Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation
Junhui Li | Yuval Marton | Philip Resnik | Hal Daumé III
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages
Yoav Goldberg | Yuval Marton | Ines Rehbein | Yannick Versley | Özlem Çetinoğlu | Joel Tetreault
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages

2013

pdf
Online Relative Margin Maximization for Statistical Machine Translation
Vladimir Eidelman | Yuval Marton | Philip Resnik
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Dependency Parsing of Modern Standard Arabic with Lexical and Inflectional Features
Yuval Marton | Nizar Habash | Owen Rambow
Computational Linguistics, Volume 39, Issue 1 - March 2013

pdf bib
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages
Yoav Goldberg | Yuval Marton | Ines Rehbein | Yannick Versley
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf
SPMRL‘13 Shared Task System: The CADIM Arabic Dependency Parser
Yuval Marton | Nizar Habash | Owen Rambow | Sarah Alkhulani
Proceedings of the Fourth Workshop on Statistical Parsing of Morphologically-Rich Languages

2012

pdf bib
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)
Eneko Agirre | Johan Bos | Mona Diab | Suresh Manandhar | Yuval Marton | Deniz Yuret
*SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012)

pdf bib
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages
Marianna Apidianaki | Ido Dagan | Jennifer Foster | Yuval Marton | Djamé Seddah | Reut Tsarfaty
Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages

pdf
On-Demand Distributional Semantic Distance and Paraphrasing
Yuval Marton
Tutorial Abstracts at the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2011

pdf
Improving Arabic Dependency Parsing with Form-based and Functional Morphological Features
Yuval Marton | Nizar Habash | Owen Rambow
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
Filtering Antonymous, Trend-Contrasting, and Polarity-Dissimilar Distributional Paraphrases for Improving Statistical Machine Translation
Yuval Marton | Ahmed El Kholy | Nizar Habash
Proceedings of the Sixth Workshop on Statistical Machine Translation

2010

pdf bib
Improving Arabic Dependency Parsing with Lexical and Inflectional Morphological Features
Yuval Marton | Nizar Habash | Owen Rambow
Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages

pdf
Reordering Matrix Post-verbal Subjects for Arabic-to-English SMT
Marine Carpuat | Yuval Marton | Nizar Habash
Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs

We improve our recently proposed technique for integrating Arabic verb-subject constructions in SMT word alignment (Carpuat et al., 2010) by distinguishing between matrix (or main clause) and non-matrix Arabic verb-subject constructions. In gold translations, most matrix VS (main clause verb-subject) constructions are translated in inverted SV order, while non-matrix (subordinate clause) VS constructions are inverted in only half the cases. In addition, while detecting verbs and their subjects is a hard task, our syntactic parser detects VS constructions better in matrix than in non-matrix clauses. As a result, reordering only matrix VS for word alignment consistently improves translation quality over a phrase-based SMT baseline, and over reordering all VS constructions, in both medium- and large-scale settings. In fact, the improvements obtained by reordering matrix VS on the medium-scale setting remarkably represent 44% of the gain in BLEU and 51% of the gain in TER obtained with a word alignment training bitext that is 5 times larger.

pdf
Improved Statistical Machine Translation with Hybrid Phrasal Paraphrases Derived from Monolingual Text and a Shallow Lexical Resource
Yuval Marton
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Research Papers

Paraphrase generation is useful for various NLP tasks. But pivoting techniques for paraphrasing have limited applicability due to their reliance on parallel texts, although they benefit from linguistic knowledge implicit in the sentence alignment. Distributional paraphrasing has wider applicability, but doesn’t benefit from any linguistic knowledge. We combine a distributional semantic distance measure (based on a non-annotated corpus) with a shallow linguistic resource to create a hybrid semantic distance measure of words, which we extend to phrases. We embed this extended hybrid measure in a distributional paraphrasing technique, benefiting from both linguistic knowledge and independence from parallel texts. Evaluated in statistical machine translation tasks by augmenting translation models with paraphrase-based translation rules, we show our novel technique is superior to the non-augmented baseline and both the distributional and pivot paraphrasing techniques. We train models on both a full-size dataset as well as a simulated “low density” small dataset.

pdf
Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
Marine Carpuat | Yuval Marton | Nizar Habash
Proceedings of the ACL 2010 Conference Short Papers

pdf
Domain-Independent Novel Event Discovery and Semi-Automatic Event Annotation
Hao Li | Xiang Li | Heng Ji | Yuval Marton
Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation

2009

pdf
The University of Maryland Statistical Machine Translation System for the Fourth Workshop on Machine Translation
Chris Dyer | Hendra Setiawan | Yuval Marton | Philip Resnik
Proceedings of the Fourth Workshop on Statistical Machine Translation

pdf
Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases
Yuval Marton | Chris Callison-Burch | Philip Resnik
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
Estimating Semantic Distance Using Soft Semantic Constraints in Knowledge-Source – Corpus Hybrid Models
Yuval Marton | Saif Mohammad | Philip Resnik
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

2008

pdf
Online Large-Margin Training of Syntactic and Structural Translation Features
David Chiang | Yuval Marton | Philip Resnik
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

pdf
Soft Syntactic Constraints for Hierarchical Phrased-Based Translation
Yuval Marton | Philip Resnik
Proceedings of ACL-08: HLT