other language 0.003988775999999999
different language 0.0038576319999999997
target language 0.0036917449999999997
language identification 0.00359889
previous language 0.003514332
language models 0.003481715
automatic language 0.003447356
language processing 0.003424672
natural language 0.0034226639999999997
embedded language 0.0034077779999999998
language corpora 0.0034009039999999997
language identifica 0.003400626
language identi 0.003384189
generic language 0.003380195
shelf language 0.003377018
language identifier 0.003370776
language iden 0.003367235
grate language 0.0033631489999999997
language identitication 0.0033631489999999997
language 0.00312468
training data 0.0017470139999999999
domain data 0.0017109550000000001
feature set 0.001646467
ing data 0.0016344950000000001
same languages 0.001523972
data sources 0.001446953
feature selection 0.001438988
short text 0.0014100760000000001
model parameters 0.001405595
synthetic data 0.001404494
bayes model 0.001389951
text messages 0.0013854140000000002
machine translation 0.001379209
final model 0.00136061
training documents 0.001358943
embedded model 0.001334408
languages url 0.001332858
event model 0.001310073
parallel corpus 0.001292657
trained model 0.001292417
specific features 0.001291718
text categorization 0.0012856970000000001
input text 0.001280539
traditional text 0.001262432
translation purposes 0.001254653
tool languages 0.001254474
conventional text 0.0012499710000000001
european languages 0.0012468409999999998
cepts text 0.0012441750000000001
candidate feature 0.0012394349999999999
based feature 0.001238897
other systems 0.001226163
feature selec 0.001224528
key features 0.001219381
dataset documents 0.001199644
other programming 0.001180015
short documents 0.001166205
classification tools 0.001164439
other researchers 0.001149682
bayes classification 0.001139913
document datasets 0.001116159
other sys 0.001113862
identification corpus 0.0011020140000000001
test dataset 0.0010866580000000001
model 0.00105131
government documents 0.001048907
single domain 0.001042821
full set 0.0010417920000000002
standalone classification 0.00104069
input documents 0.001036668
input document 0.001035642
relative accuracy 0.001029587
different domains 0.001021233
document represen 0.001004786
languages 9.66565E-4
sharding approach 9.652230000000001E-4
translation 9.63827E-4
parallel cor 9.58509E-4
web services 9.54061E-4
feature 9.47216E-4
identification research 9.46527E-4
matching algorithm 9.410309999999999E-4
validation set 9.39385E-4
high accuracy 9.2819E-4
test corpora 9.24287E-4
identification solution 9.13302E-4
web page 9.128140000000001E-4
good accuracy 9.11654E-4
biomedical parallel 9.10569E-4
many web 9.09261E-4
new domain 9.05594E-4
preferred method 9.00788E-4
web pages 8.94445E-4
web service 8.910890000000001E-4
same encoding 8.851180000000001E-4
identification step 8.7417E-4
emea corpus 8.738400000000001E-4
ferent domain 8.7215E-4
such datasets 8.709449999999999E-4
absolute accuracy 8.69175E-4
