parallel corpus 0.002706738
several corpus 0.00237972
corpus encoding 0.002207769
corpus development 0.002110421
corpus component 0.002094263
distinct language 0.001884186
corpus 0.00182982
markup language 0.001809305
language engineering 0.001697722
language families 0.001687671
language 0.0014589
linguistic corpora 0.001417277
parallel alignment 0.001359446
speech data 0.001287058
project languages 0.001234514
different types 0.001203761
european languages 0.001176741
parallel component 0.001141361
different levels 0.001088233
separate data 0.0010855650000000001
sentence level 0.001076002
data type 0.0010735710000000002
data architecture 0.001013519
single text 9.9899E-4
cee languages 9.85839E-4
few corpora 9.69996E-4
linguistic orpora 9.32017E-4
english version 9.2697E-4
encoding format 8.93153E-4
parallel 8.76918E-4
cesalign document 8.75258E-4
fourth document 8.75258E-4
comparable texts 8.74678E-4
encoding standard 8.65916E-4
original english 8.480510000000001E-4
sentence boundaries 8.43288E-4
qualitative information 8.21186E-4
copernicus project 8.07416E-4
following sections 7.82592E-4
cee project 7.52203E-4
national science 7.45109E-4
project multext 7.40336E-4
token level 7.35749E-4
languages 7.34075E-4
annotation 7.30739E-4
eagles project 7.2893E-4
lre project 7.2893E-4
standard generalized 7.25762E-4
paragraph level 7.18977E-4
corpora 7.14199E-4
various levels 7.06817E-4
european projects 7.00655E-4
cesdoc version 6.76414E-4
eastern european 6.74331E-4
representation subgroup 6.70298E-4
sgml documents 6.5941E-4
distribution efforts 6.57407E-4
retrieval tasks 6.51856E-4
specific characteristics 6.517059999999999E-4
document 6.47101E-4
easy processing 6.35769E-4
texts 6.18562E-4
morphosyntactic markup 6.18065E-4
cesdoc encoding 6.16257E-4
cesana encoding 6.141019999999999E-4
significant resources 6.12781E-4
principled encoding 6.0869E-4
sentence 6.08057E-4
type definition 6.028769999999999E-4
sgml markup 5.93975E-4
generalized markup 5.882E-4
morphosyntactic descriptions 5.82099E-4
information 5.73954E-4
separate sgml 5.67482E-4
cesana versions 5.673599999999999E-4
separate dtd 5.55955E-4
efficient extraction 5.506160000000001E-4
joint effort 5.48922E-4
george orwell 5.2722E-4
speech 5.25405E-4
wordform lexicons 5.20506E-4
tel guidelines 5.19941E-4
special concerns 5.195789999999999E-4
format 5.15204E-4
science foundation 5.11051E-4
project 5.00439E-4
projects multext 4.97886E-4
english 4.88864E-4
standard 4.87967E-4
alignment 4.82528E-4
national 4.78166E-4
sentences 4.69451E-4
annotations 4.68898E-4
level 4.67945E-4
european 4.42666E-4
version 4.38106E-4
representation 4.31744E-4
elements 4.29607E-4
documents 4.1584E-4
distribution 4.09695E-4
