parallel data 0.00357346
english data 0.00353938
training data 0.003405443
test data 0.00331267
such data 0.003195326
ing data 0.003173341
language model 0.00312988
new data 0.0030313980000000003
web data 0.0029989640000000002
data analysis 0.0029450680000000003
monolingual data 0.002865864
data resource 0.002865514
total data 0.002852443
crawl data 0.0028411630000000003
comparable data 0.0028266090000000003
data consortium 0.002823697
exploratory data 0.002823697
commoncrawl data 0.0028104920000000004
allel data 0.002802422
wmt data 0.002787634
moncrawl data 0.0027874460000000003
data stor 0.0027858590000000003
english language 0.00267242
different language 0.002295053
parallel text 0.00219632
language pairs 0.002194933
language tokens 0.0021894230000000002
same language 0.002180706
language models 0.002173173
target language 0.002170892
new language 0.002164438
several language 0.002143256
translation system 0.0020580069999999997
translation test 0.00205364
language pair 0.002045956
language names 0.00202849
training text 0.002028303
many language 0.002022544
specific language 0.002001149
language technology 0.001992698
foreign language 0.001975375
language codes 0.001947588
language precision 0.001945947
correct language 0.001944257
language students 0.001944063
gram language 0.001938352
language name 0.001933499
language identification 0.001931614
news translation 0.0019224049999999999
wrong language 0.001921461
language identifier 0.001919177
language identifi 0.001919177
domain translation 0.0019038290000000001
parallel training 0.001872323
machine translation 0.001867538
news text 0.001804295
parallel corpus 0.001788587
translation probabilities 0.00173975
guage model 0.001736698
translation experiments 0.001731687
ment model 0.001722447
model weights 0.00172082
english translations 0.001709938
full translation 0.001709924
language 0.00168633
translation systems 0.0016817450000000001
speech translation 0.001667397
translation performance 0.001661188
target text 0.001660712
parallel sentences 0.0016072589999999998
translation tasks 0.001606936
chine translation 0.001578517
text mining 0.0015441589999999998
word segmentation 0.001529148
test corpus 0.001527797
crowdsourced translation 0.001526983
word removal 0.0015255919999999999
word align 0.0015213459999999998
stop word 0.001510799
english tokens 0.001489183
parallel document 0.0014833289999999998
news training 0.0014802980000000001
text chunk 0.0014508449999999999
model 0.00144355
allel text 0.001425282
parallel documents 0.00141757
text chunks 0.0014136259999999999
sentence pairs 0.00141302
text blocks 0.001409828
parallel texts 0.0014018
test set 0.0013938190000000001
other mining 0.001382169
domain test 0.001368949
parallel segments 0.001356429
parallel sen 0.001343144
candidate parallel 0.0013413729999999999
sentence alignment 0.001316556
machine translations 0.001297126
baseline corpus 0.001296551
other researchers 0.001295807
