other language 0.0040430399999999995
language identification 0.0034443909999999998
first language 0.003441073
language urls 0.003361845
language approach 0.003329332
language recognition 0.003234651
language filter 0.003229967
language characteristics 0.0032109409999999997
primary language 0.003186024
language 0.00291364
english text 0.00244271
other languages 0.00228466
english results 0.002024955
text corpus 0.0019851670000000004
english texts 0.001942227
english websites 0.001813751
total english 0.001796792
english spell 0.0017866940000000001
english speaking 0.001742914
english posts 0.001739041
user languages 0.001636282
target languages 0.001602561
web corpus 0.001581795
content languages 0.001503433
available languages 0.00149585
frequent languages 0.001479452
english 0.0014494
top languages 0.001427294
noteworthy languages 0.0014097649999999999
other hand 0.001393667
other strategies 0.0013814320000000001
other contexts 0.0013814320000000001
other fine 0.0013814320000000001
corpus size 0.001342865
text categorization 0.0013307240000000001
text cor 0.0013243690000000002
corpus building 0.0012976200000000002
microtext corpus 0.001277452
web documents 0.0012734410000000002
art text 0.001250667
free corpus 0.001246265
data sources 0.001231424
world data 0.001226922
same domain 0.001221266
word url 0.0011908790000000002
friendfeed data 0.00117126
languages 0.00115526
several urls 0.001141712
identification system 0.001126911
same time 0.001086213
web pages 0.001044715
same tools 0.001044564
several ones 0.00100582
documents filter 9.9983E-4
web page 9.93093E-4
corpus 9.91857E-4
similar work 9.877380000000002E-4
source software 9.85322E-4
different content 9.83584E-4
source tools 9.707190000000001E-4
several scenarios 9.48424E-4
several weeks 9.48424E-4
media documents 9.43963E-4
open source 9.40017E-4
user pages 9.35799E-4
first step 9.28993E-4
news category 9.2157E-4
result set 9.21301E-4
web service 9.20824E-4
first com 9.18571E-4
pages links 9.16806E-4
identification tool 9.11643E-4
domain names 9.083240000000001E-4
identification software 9.05635E-4
frequent words 9.015799999999999E-4
different light 8.93936E-4
linguistic structure 8.894529999999999E-4
document validity 8.79835E-4
further work 8.76816E-4
regular set 8.71261E-4
moderation system 8.71167E-4
short urls 8.67517E-4
new messages 8.645510000000001E-4
linguistic studies 8.64069E-4
identification task 8.581999999999999E-4
filter first 8.4376E-4
first filter 8.4376E-4
major user 8.39933E-4
domain name 8.34219E-4
linguistic relevance 8.341679999999999E-4
guage identification 8.29557E-4
short messages 8.29303E-4
main time 8.25235E-4
identification microtext 8.163459999999999E-4
benchmark set 8.15992E-4
expensive process 8.11888E-4
second step 8.08306E-4
ing approach 8.04948E-4
tering process 8.0344E-4
gathering process 8.0344E-4
