2012
pdf
bib
The TALP-UPC phrase-based translation systems for WMT12: Morphology simplification and domain adaptation
Lluís Formiga
|
Carlos A. Henríquez Q.
|
Adolfo Hernández
|
José B. Mariño
|
Enric Monte
|
José A. R. Fonollosa
Proceedings of the Seventh Workshop on Statistical Machine Translation
pdf
bib
Proceedings of ACL 2012 Student Research Workshop
Jackie C. K. Cheung
|
Jun Hatori
|
Carlos Henriquez
|
Ann Irvine
Proceedings of ACL 2012 Student Research Workshop
2011
pdf
bib
abs
Deriving translation units using small additional corpora
Carlos A. Henríquez Q.
|
José B. Mariño
|
Rafael E. Banchs
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
We present a novel strategy to derive new translation units using an additional bilingual corpus and a previously trained SMT system. The units were used to adapt the SMT system. The derivation process can be applied when the additional corpus is very small compared with the original train corpus and it does not require to compute new word alignments using all corpora. The strategy is based in the Levenshtein Distance and its resulting path. We reported a statistically significant improvement, with a confidence level of 99%, when adapting an Ngram-based Catalan-Spanish system using an additional corpus that represents less than 0.5% of the original train corpus. The additional translation units were able to solve morphological and lexical errors and added previously unknown words to the vocabulary.
bib
Deriving translation units using small additional corpora
Carlos A. Henríquez Q.
|
José B. Mariño
|
Rafael E. Banchs
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
pdf
bib
Enhancing scarce-resource language translation through pivot combinations
Marta R. Costa-jussà
|
Carlos Henríquez
|
Rafael E. Banchs
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
pdf
bib
abs
UPC-BMIC-VDU system description for the IWSLT 2010: testing several collocation segmentations in a phrase-based SMT system
Carlos Henríquez
|
Marta R. Costa-jussà
|
Vidas Daudaravicius
|
Rafael E. Banchs
|
José B. Mariño
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes the UPC-BMIC-VMU participation in the IWSLT 2010 evaluation campaign. The SMT system is a standard phrase-based enriched with novel segmentations. These novel segmentations are computed using statistical measures such as Log-likelihood, T-score, Chi-squared, Dice, Mutual Information or Gravity-Counts. The analysis of translation results allows to divide measures into three groups. First, Log-likelihood, Chi-squared and T-score tend to combine high frequency words and collocation segments are very short. They improve the SMT system by adding new translation units. Second, Mutual Information and Dice tend to combine low frequency words and collocation segments are short. They improve the SMT system by smoothing the translation units. And third, GravityCounts tends to combine high and low frequency words and collocation segments are long. However, in this case, the SMT system is not improved. Thus, the road-map for translation system improvement is to introduce new phrases with either low frequency or high frequency words. It is hard to introduce new phrases with low and high frequency words in order to improve translation quality. Experimental results are reported in the French-to-English IWSLT 2010 evaluation where our system was ranked 3rd out of nine systems.
pdf
bib
Using Collocation Segmentation to Augment the Phrase Table
Carlos A. Henríquez Q.
|
Marta Ruiz Costa-jussà
|
Vidas Daudaravicius
|
Rafael E. Banchs
|
José B. Mariño
Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR
2009
pdf
bib
The TALP-UPC Phrase-Based Translation System for EACL-WMT 2009
José A. R. Fonollosa
|
Maxim Khalilov
|
Marta R. Costa-jussà
|
José B. Mariño
|
Carlos A. Henríquez Q.
|
Adolfo Hernández H.
|
Rafael E. Banchs
Proceedings of the Fourth Workshop on Statistical Machine Translation
2008
pdf
bib
abs
The TALP&I2R SMT systems for IWSLT 2008.
Maxim Khalilov
|
Marta R. Costa-jussà
|
Carlos A. Henríquez Q.
|
José A. R. Fonollosa
|
Adolfo Hernández H.
|
José B. Mariño
|
Rafael E. Banchs
|
Chen Boxing
|
Min Zhang
|
Aiti Aw
|
Haizhou Li
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polite`cnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.
pdf
bib
The TALP-UPC Ngram-Based Statistical Machine Translation System for ACL-WMT 2008
Maxim Khalilov
|
Adolfo Hernández H.
|
Marta R. Costa-jussà
|
Josep M. Crego
|
Carlos A. Henríquez Q.
|
Patrik Lambert
|
José A. R. Fonollosa
|
José B. Mariño
|
Rafael E. Banchs
Proceedings of the Third Workshop on Statistical Machine Translation