2015
abs
Applications of Social Media Text Analysis
Atefeh Farzindar
|
Diana Inkpen
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts
Analyzing social media texts is a complex problem that becomes difficult to address using traditional Natural Language Processing (NLP) methods. Our tutorial focuses on presenting new methods for NLP tasks and applications that work on noisy and informal texts, such as the ones from social media.Automatic processing of large collections of social media texts is important because they contain a lot of useful information, due to the in-creasing popularity of all types of social media. Use of social media and messaging apps grew 203 percent year-on-year in 2013, with overall app use rising 115 percent over the same period, as reported by Statista, citing data from Flurry Analytics. This growth means that 1.61 billion people are now active in social media around the world and this is expected to advance to 2 billion users in 2016, led by India. The research shows that consumers are now spending daily 5.6 hours on digital media including social media and mo-bile internet usage.At the heart of this interest is the ability for users to create and share content via a variety of platforms such as blogs, micro-blogs, collaborative wikis, multimedia sharing sites, social net-working sites. The unprecedented volume and variety of user-generated content, as well as the user interaction network constitute new opportunities for understanding social behavior and building socially intelligent systems. Therefore it is important to investigate methods for knowledge extraction from social media data. Furthermore, we can use this information to detect and retrieve more related content about events, such as photos and video clips that have caption texts.
2014
pdf
abs
Hashtag Occurrences, Layout and Translation: A Corpus-driven Analysis of Tweets Published by the Canadian Government
Fabrizio Gotti
|
Phillippe Langlais
|
Atefeh Farzindar
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
We present an aligned bilingual corpus of 8758 tweet pairs in French and English, derived from Canadian government agencies. Hashtags appear in a tweet’s prologue, announcing its topic, or in the tweet’s text in lieu of traditional words, or in an epilogue. Hashtags are words prefixed with a pound sign in 80% of the cases. The rest is mostly multiword hashtags, for which we describe a segmentation algorithm. A manual analysis of the bilingual alignment of 5000 hashtags shows that 5% (French) to 18% (English) of them don’t have a counterpart in their containing tweet’s translation. This analysis shows that 80% of multiword hashtags are correctly translated by humans, and that the mistranslation of the rest may be due to incomplete translation directives regarding social media. We show how these resources and their analysis can guide the design of a machine translation pipeline, and its evaluation. A baseline system implementing a tweet-specific tokenizer yields promising results. The system is improved by translating epilogues, prologues, and text separately. We attempt to feed the SMT engine with the original hashtag and some alternatives (“dehashed” version or a segmented version of multiword hashtags), but translation quality improves at the cost of hashtag recall.
pdf
bib
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)
Atefeh Farzindar
|
Diana Inkpen
|
Michael Gamon
|
Meena Nagarajan
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)
pdf
Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Application – the case of Tunisian Arabic and the Social Media
Fatiha Sadat
|
Fatma Mallek
|
Mohamed Boudabous
|
Rahma Sellami
|
Atefeh Farzindar
Proceedings of Workshop on Lexical and Grammatical Resources for Language Processing
pdf
Automatic Identification of Arabic Language Varieties and Dialects in Social Media
Fatiha Sadat
|
Farzindar Kazemi
|
Atefeh Farzindar
Proceedings of the Second Workshop on Natural Language Processing for Social Media (SocialNLP)
2013
pdf
bib
Les défis de l’analyse des réseaux sociaux pour le traitement automatique des langues [Natural language processing challenges for analysing social networks]
Atefeh Farzindar
|
Mathieu Roche
Traitement Automatique des Langues, Volume 54, Numéro 3 : Traitement automatique du langage naturel pour l'analyse des réseaux sociaux (TAL et réseaux sociaux) [Social Networks and NLP]
pdf
bib
Proceedings of the Workshop on Language Analysis in Social Media
Cristian Danescu-Niculescu-Mizil
|
Atefeh Farzindar
|
Michael Gamon
|
Diana Inkpen
|
Meena Nagarajan
Proceedings of the Workshop on Language Analysis in Social Media
pdf
Translating Government Agencies’ Tweet Feeds: Specificities, Problems and (a few) Solutions
Fabrizio Gotti
|
Philippe Langlais
|
Atefeh Farzindar
Proceedings of the Workshop on Language Analysis in Social Media
2012
pdf
bib
Proceedings of the Workshop on Semantic Analysis in Social Media
Atefeh Farzindar
|
Diana Inkpen
Proceedings of the Workshop on Semantic Analysis in Social Media
pdf
abs
Evaluation of Domain Adaptation Techniques for TRANSLI in a Real-World Environment
Atefeh Farzindar
|
Wael Khreich
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Commercial MT User Program
Statistical Machine Translation (SMT) systems specialized for one domain often perform poorly when applied to other domains. Domain adaptation techniques allow SMT models trained from a source domain with abundant data to accommodate different target domains with limited data. This paper evaluates the performance of two adaptive techniques based on log-linear and mixture models on data from the legal domain in real-world settings. Performance evaluation includes post-editing time and effort required by a professional post-editor to improve the quality of machine-generated translations to meet industry standards, as well as traditional automated scoring techniques (BLEU scores). Results indicates that the domain adaptation techniques can yield a significant increase in BLEU score (up to three points) and a significant reduction in post-editing time of about one second per word in an operational environment.
2010
pdf
TRANSLI: trusted automated translation at the service of justice
Atefeh Farzindar
Proceedings of Translating and the Computer 32
pdf
abs
Estimating Machine Translation Post-Editing Effort with HTER
Lucia Specia
|
Atefeh Farzindar
Proceedings of the Second Joint EM+/CNGL Workshop: Bringing MT to the User: Research on Integrating MT in the Translation Industry
Although Machine Translation (MT) has been attracting more and more attention from the translation industry, the quality of current MT systems still requires humans to post-edit translations to ensure their quality. The time necessary to post-edit bad quality translations can be the same or even longer than that of translating without an MT system. It is well known, however, that the quality of an MT system is generally not homogeneous across all translated segments. In order to make MT more useful to the translation industry, it is therefore crucial to have a mechanism to judge MT quality at the segment level to prevent bad quality translations from being post-edited within the translation workflow. We describe an approach to estimate translation post-editing effort at sentence level in terms of Human-targeted Translation Edit Rate (HTER) based on a number of features reflecting the difficulty of translating the source sentence and discrepancies between the source and translation sentences. HTER is a simple metric and obtaining HTER annotated data can be made part of the translation workflow. We show that this approach is more reliable at filtering out bad translations than other simple criteria commonly used in the translation industry, such as sentence length.
2009
pdf
An Automatic Translation Management System for Legal Texts
Atefeh Farzindar
Proceedings of Machine Translation Summit XII: Commercial MT User Program
2008
pdf
abs
Automatic Translation of Court Judgments
Fabrizio Gotti
|
Guy Lapalme
|
Elliott Macklovitch
|
Atefeh Farzindar
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT
2006
pdf
abs
Résumé multidocuments orienté par une requête complexe
Atefeh Farzindar
|
Guy Lapalme
Actes de la 13ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Nous présentons un système de synthèse d’information pour la production de résumés multidocuments orientés par une requête complexe. Après une analyse du profil de l’utilisateur exprimé par des questions complexes, nous comparons la similarité entre les documents à résumer avec les questions à deux niveaux : global et détaillé. Cette étude démontre l’importance d’étudier pour une requête la pertinence d’une phrase à l’intérieur de la structure thématique du document. Cette méthodologie a été appliquée lors de notre participation à la campagne d’évaluation DUC 2005 où notre système a été classé parmi les meilleurs.
2005
pdf
abs
Production automatique du résumé de textes juridiques: évaluation de qualité et d’acceptabilité
Atefeh Farzindar
|
Guy Lapalme
Actes de la 12ème conférence sur le Traitement Automatique des Langues Naturelles. Articles longs
Nous décrivons un projet de production de résumé automatique de textes pour le domaine juridique pour lequel nous avons utilisé un corpus des jugements de la cour fédérale du Canada. Nous présentons notre système de résumé LetSum ainsi que l’évaluation des résumés produits. L’évaluation de 120 résumés par 12 avocats montre que la qualité des résumés produits par LetSum est comparable avec celle des résumés écrits par des humains.
2004
pdf
Legal Text Summarization by Exploration of the Thematic Structure and Argumentative Roles
Atefeh Farzindar
|
Guy Lapalme
Text Summarization Branches Out
pdf
abs
Développement d’un système de Résumé automatique de Textes Juridiques
Atefeh Farzindar
Actes de la 11ème conférence sur le Traitement Automatique des Langues Naturelles. REncontres jeunes Chercheurs en Informatique pour le Traitement Automatique des Langues (Posters)
Nous décrivons notre méthode de production automatique du résumé de textes juridiques. C’est une nouvelle application du résumé qui permet aux juristes de consulter rapidement les idées clés d’une décision juridique pour trouver les jurisprudences pertinentes à leurs besoins. Notre approche est basée sur l’exploitation de l’architecture des documents et les structures thématiques, afin de constituer automatiquement des fiches de résumé qui augmentent la cohérence et la lisibilité du résumé. Dans cet article nous détaillons les conceptions des différentes composantes du système, appelé LetSum et le résultat d’évaluation.