2021
pdf
abs
Bilingual Terminology Extraction Using Neural Word Embeddings on Comparable Corpora
Darya Filippova
|
Burcu Can
|
Gloria Corpas Pastor
Proceedings of the Student Research Workshop Associated with RANLP 2021
Term and glossary management are vital steps of preparation of every language specialist, and they play a very important role at the stage of education of translation professionals. The growing trend of efficient time management and constant time constraints we may observe in every job sector increases the necessity of the automatic glossary compilation. Many well-performing bilingual AET systems are based on processing parallel data, however, such parallel corpora are not always available for a specific domain or a language pair. Domain-specific, bilingual access to information and its retrieval based on comparable corpora is a very promising area of research that requires a detailed analysis of both available data sources and the possible extraction techniques. This work focuses on domain-specific automatic terminology extraction from comparable corpora for the English – Russian language pair by utilizing neural word embeddings.
pdf
abs
Interpreting and Technology: Is the Sky Really the Limit?
Gloria Corpas Pastor
Proceedings of the Translation and Interpreting Technology Online Conference
Nowadays there is a pressing need to develop interpreting-related technolo-gies, with practitioners and other end-users increasingly calling for tools tai-lored to their needs and their new interpreting scenarios. But, at the same time, interpreting as a human activity has resisted complete automation for various reasons, such as fear, unawareness, communication complexities, lack of dedicated tools, etc. Several computer-assisted interpreting tools and resources for interpreters have been developed, although they are rather modest in terms of the sup-port they provide. In the same vein, and despite the pressing need to aiding in multilingual mediation, machine interpreting is still under development, with the exception of a few success stories. This paper will present the results of VIP, a R&D project on language technologies applied to interpreting. It is the ‘seed’ of a family of projects on interpreting technologies which are currently being developed or have just been completed at the Research Institute of Multilingual Language Technol-ogies (IUITLM), University of Malaga.
pdf
abs
Cross-Lingual Named Entity Recognition via FastAlign: a Case Study
Ali Hatami
|
Ruslan Mitkov
|
Gloria Corpas Pastor
Proceedings of the Translation and Interpreting Technology Online Conference
Named Entity Recognition is an essential task in natural language processing to detect entities and classify them into predetermined categories. An entity is a meaningful word, or phrase that refers to proper nouns. Named Entities play an important role in different NLP tasks such as Information Extraction, Question Answering and Machine Translation. In Machine Translation, named entities often cause translation failures regardless of local context, affecting the output quality of translation. Annotating named entities is a time-consuming and expensive process especially for low-resource languages. One solution for this problem is to use word alignment methods in bilingual parallel corpora in which just one side has been annotated. The goal is to extract named entities in the target language by using the annotated corpus of the source language. In this paper, we compare the performance of two alignment methods, Grow-diag-final-and and Intersect Symmetrisation heuristics, to exploit the annotation projection of English-Brazilian Portuguese bilingual corpus to detect named entities in Brazilian Portuguese. A NER model that is trained on annotated data extracted from the alignment methods, is used to evaluate the performance of aligners. Experimental results show the Intersect Symmetrisation is able to achieve superior performance scores compared to the Grow-diag-final-and heuristic in Brazilian Portuguese.
2018
pdf
abs
Wolves at SemEval-2018 Task 10: Semantic Discrimination based on Knowledge and Association
Shiva Taslimipoor
|
Omid Rohanian
|
Le An Ha
|
Gloria Corpas Pastor
|
Ruslan Mitkov
Proceedings of the 12th International Workshop on Semantic Evaluation
This paper describes the system submitted to SemEval 2018 shared task 10 ‘Capturing Dicriminative Attributes’. We use a combination of knowledge-based and co-occurrence features to capture the semantic difference between two words in relation to an attribute. We define scores based on association measures, ngram counts, word similarity, and ConceptNet relations. The system is ranked 4th (joint) on the official leaderboard of the task.
2017
bib
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology
Irina Temnikova
|
Constantin Orasan
|
Gloria Corpas Pastor
|
Stephan Vogel
Proceedings of the Workshop Human-Informed Translation and Interpreting Technology
2015
pdf
MiniExperts: An SVM Approach for Measuring Semantic Textual Similarity
Hanna Béchara
|
Hernani Costa
|
Shiva Taslimipoor
|
Rohit Gupta
|
Constantin Orasan
|
Gloria Corpas Pastor
|
Ruslan Mitkov
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)
pdf
bib
The EXPERT project: Advancing the state of the art in hybrid translation technologies
Constantin Orasan
|
Alessandro Cattelan
|
Gloria Corpas Pastor
|
Josef van Genabith
|
Manuel Herranz
|
Juan José Arevalillo
|
Qun Liu
|
Khalil Sima’an
|
Lucia Specia
Proceedings of Translating and the Computer 37
2014
pdf
A comparative User Evaluation of Terminology Management Tools for Interpreters
Hernani Costa
|
Gloria Corpas Pastor
|
Isabel Durán Muñoz
Proceedings of the 4th International Workshop on Computational Terminology (Computerm)
pdf
iCompileCorpora: a web-based application to semi-automatically compile multilingual comparable corpora
Hernani Costa
|
Gloria Corpas Pastor
|
Miriam Seghiri
Proceedings of Translating and the Computer 36
2013
bib
All that glitters is not gold when translating phraseological units
Gloria Corpas Pastor
Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technologies
pdf
A flexible framework for collocation retrieval and translation from parallel and comparable corpora
Oscar Mendoza Rivera
|
Ruslan Mitkov
|
Gloria Corpas Pastor
Proceedings of the Workshop on Multi-word Units in Machine Translation and Translation Technologies
2012
pdf
ProTermino: a comprehensive web-based terminological management tool based on knowledge representation
Isabel Durán Muñoz
|
Gloria Corpas Pastor
|
Le An Ha
Proceedings of Translating and the Computer 34
2008
pdf
abs
Translation universals: do they exist? A corpus-based NLP study of convergence and simplification
Gloria Corpas Pastor
|
Ruslan Mitkov
|
Naveed Afzal
|
Viktor Pekar
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Research Papers
Convergence and simplification are two of the so-called universals in translation studies. The first one postulates that translated texts tend to be more similar than non-translated texts. The second one postulates that translated texts are simpler, easier-to-understand than non-translated ones. This paper discusses the results of a project which applies NLP techniques over comparable corpora of translated and non-translated texts in Spanish seeking to establish whether these two universals hold Corpas Pastor (2008).
pdf
abs
Mutual Bilingual Terminology Extraction
Le An Ha
|
Gabriela Fernandez
|
Ruslan Mitkov
|
Gloria Corpas
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper describes a novel methodology to perform bilingual terminology extraction, in which automatic alignment is used to improve the performance of terminology extraction for each language. The strengths of monolingual terminology extraction for each language are exploited to improve the performance of terminology extraction in the other language, thanks to the availability of a sentence-level aligned bilingual corpus, and an automatic noun phrase alignment mechanism. The experiment indicates that weaknesses in monolingual terminology extraction due to the limitation of resources in certain languages can be overcome by using another language which has no such limitation.
2007
pdf
Lost in specialised translation: the corpus as an inexpensive and under-exploited aid for language service providers
Gloria Corpas Pastor
Proceedings of Translating and the Computer 29