Gabriel Lopes

Also published as: Gabriel P. Lopes, Gabriel Pereira Lopes, Gabriel Pereira Lopes, Jose Gabriel Lopes, Jose Gabriel P. Lopes, José Gabriel Pereira Lopes

2016

pdf
English-Portuguese Biomedical Translation Task Using a Genuine Phrase-Based Statistical Machine Translation Approach
José Aires | Gabriel Lopes | Luís Gomes
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf
First Steps Towards Coverage-Based Document Alignment
Luís Gomes | Gabriel Pereira Lopes
Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

pdf abs
Using Bilingual Segments in Generating Word-to-word Translations
Kavitha Mahesh | Gabriel Pereira Lopes | Luís Gomes
Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6)

We defend that bilingual lexicons automatically extracted from parallel corpora, whose entries have been meanwhile validated by linguists and classified as correct or incorrect, should constitute a specific parallel corpora. And, in this paper, we propose to use word-to-word translations to learn morph-units (comprising of bilingual stems and suffixes) from those bilingual lexicons for two language pairs L1-L2 and L1-L3 to induce a bilingual lexicon for the language pair L2-L3, apart from also learning morph-units for this other language pair. The applicability of bilingual morph-units in L1-L2 and L1-L3 is examined from the perspective of pivot-based lexicon induction for language pair L2-L3 with L1 as bridge. While the lexicon is derived by transitivity, the correspondences are identified based on previously learnt bilingual stems and suffixes rather than surface translation forms. The induced pairs are validated using a binary classifier trained on morphological and similarity-based features using an existing, automatically acquired, manually validated bilingual translation lexicon for language pair L2-L3. In this paper, we discuss the use of English (EN)-French (FR) and English (EN)-Portuguese (PT) lexicon of word-to-word translations in generating word-to-word translations for the language pair FR-PT with EN as pivot language. Generated translations are filtered out first using an SVM-based FR-PT classifier and then are manually validated.

pdf abs
First Steps Towards Coverage-Based Sentence Alignment
Luís Gomes | Gabriel Pereira Lopes
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

In this paper, we introduce a coverage-based scoring function that discriminates between parallel and non-parallel sentences. When plugged into Bleualign, a state-of-the-art sentence aligner, our function improves both precision and recall of alignments over the originally proposed BLEU score. Furthermore, since our scoring function uses Moses phrase tables directly we avoid the need to translate the texts to be aligned, which is time-consuming and a potential source of alignment errors.

Named entities and more generally Multiword Lexical Units (MWUs) are important for various applications. However, language independent methods for automatically extracting MWUs do not provide us with clean data. So, in this paper we propose a method for selecting possible named entities from automatically extracted MWUs, and later, a statistics-based language independent unsupervised approach is applied to possible named entities in order to cluster them according to their type. Statistical features used by our clustering process are described and motivated. The Model-Based Clustering Analysis (MBCA) software enabled us to obtain different clusters for proposed named entities. The method was applied to Bulgarian and English. For some clusters, precision is very high; other clusters still need further refinement. Based on the obtained clusters, it is also possible to classify new possible named entities.

2003

pdf
Automatic Acquisition of Word Interaction Patterns from Corpora
Veska Noncheva | Joaqium Ferreira da Silva | Gabriel Lopes
Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods

2002

pdf
Using Co-Composition for Acquiring Syntactic and Semantic Subcategorisation
Pablo Gamallo | Alexandre Agustini | Gabriel P. Lopes
Proceedings of the ACL-02 Workshop on Unsupervised Lexical Acquisition

2001

pdf abs
Cognates alignment
António Ribeiro | Gaël Dias | Gabriel Lopes | João Mexia
Proceedings of Machine Translation Summit VIII

Some authors (Simard et al.; Melamed; Danielsson & Mühlenbock) have suggested measures of similarity of words in different languages so as to find extra clues for alignment of parallel texts. Cognate words, like ‘Parliament’ and ‘Parlement’, in English and French respectively, provide extra anchors that help to improve the quality of the alignment. In this paper, we will extend an alignment algorithm proposed by Ribeiro et al. using typical contiguous and non-contiguous sequences of characters extracted using a statistically sound method (Dias et al.). With these typical sequences, we are able to find more reliable correspondence points and improve the alignment quality without recurring to heuristics to identify cognates.

2000

pdf
Using Confidence Bands for Parallel Texts Alignment
António Ribeiro | Gabriel Lopes | João Mexia
Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics

pdf abs
A self-learning method of parallel texts alignment
António Ribeiro | Gabriel Lopes | João Mexia
Proceedings of the Fourth Conference of the Association for Machine Translation in the Americas: Technical Papers

This paper describes a language independent method for alignment of parallel texts that re-uses acquired knowledge. The system extracts word translation equivalents and re-uses them as correspondence points in order to enhance the alignment of parallel texts. Points that may cause misalignment are filtered using confidence bands of linear regression analysis instead of heuristics, which are not theoretically reliable. Homographs bootstrap the alignment process so as to build the primary word translation lexicon. At each step, the previously acquired lexicon is re-used so as to repeatedly make finer-grained alignments and produce more reliable translation lexicons.