arabic word 0.0037807400000000003
word corpus 0.0034477300000000004
segmentation word 0.00322491
word segmentation 0.00322491
word error 0.002955782
word evaluation 0.002833734
language model 0.002718662
arabic words 0.00269879
word segmenter 0.002690367
word tokens 0.0026703300000000003
general word 0.002664834
arabic corpus 0.00261481
robust word 0.002581719
model training 0.002423167
model probability 0.002391826
model probabilities 0.002369371
stem stem 0.00226032
joint model 0.002191082
model score 0.002174473
model vocabulary 0.002153176
trigram model 0.002045573
model parameters 0.002014647
arabic sentence 0.001987442
suffixes arabic 0.0019473889999999999
arabic transliteration 0.001883545
morphological analysis 0.001869407
arabic tokens 0.0018374099999999998
training corpus 0.0018260770000000002
native arabic 0.001814333
test corpus 0.00181192
new words 0.001789568
english words 0.001772044
arabic infix 0.001769433
arabic translit 0.0017536169999999998
approximate arabic 0.001746109
arabic treebank 0.001746109
prefix stem 0.00174297
model 0.00173799
morphological analyzer 0.001705878
new corpus 0.001705588
stem candidate 0.00169611
candidate stem 0.00169611
new stem 0.001694848
unsupervised stem 0.0016860350000000002
evaluation corpus 0.0016678040000000002
stem suffix 0.0016674
morphological relationships 0.001658823
segmented corpus 0.001642961
segmentation algorithm 0.00163891
corpus size 0.001631568
words prefixes 0.001571982
segmentation error 0.0015670319999999999
unknown stem 0.0015659530000000001
stem vocabulary 0.001545346
segmentation system 0.0015219040000000001
foreign words 0.001503949
stem ratio 0.001478086
arabic 0.00147391
morpheme segmentation 0.0014723940000000001
stem acquisition 0.001452079
unsegmented corpus 0.0014477140000000001
stem candidates 0.001439007
various language 0.00142213
illegitimate stem 0.001420476
ldc corpus 0.001416363
stem properties 0.001407298
stem alywm 0.001405047
segmentation accuracy 0.001378902
proper segmentation 0.001371316
natural language 0.001357664
language systems 0.001339739
language processing 0.001319223
test set 0.001292711
system training 0.0012890010000000001
trigram language 0.0012882549999999999
unsupervised algorithm 0.001276705
input segmentation 0.001273713
segmentation ambiguity 0.001271812
language families 0.001252329
words 0.00122488
segmentation ambiguities 0.001209952
segmented training 0.0011872380000000002
prefix sequence 0.001166945
only text 0.001166427
same root 0.001153017
size baseline 0.0011479799999999998
text corpora 0.0011468070000000001
corpus 0.0011409
character position 0.001134878
different sizes 0.001117906
input text 0.001114778
morpheme sequence 0.001108449
possible segmentations 0.0011061690000000002
high prefix 0.001105276
suffix sequence 0.001091375
prefix table 0.001082345
new morphemes 0.001061172
error analyses 0.001057539
error rate 0.00103155
probability estimation 0.0010272789999999999
