case word 0.00306152
model language 0.002506392
language model 0.002506392
first word 0.002386159
previous word 0.0022895600000000004
unknown word 0.002265463
word casings 0.0022402570000000003
word casing 0.0022376450000000003
ambiguous word 0.0022021790000000003
word combinations 0.0022003780000000002
alternate word 0.00219709
word strategy 0.0021922580000000003
word mispeling 0.002180792
word lenon 0.002180792
statistical model 0.0019969890000000002
language text 0.001846656
model probabilities 0.001836567
markov model 0.001808584
guage model 0.001805879
model parameters 0.0017940690000000001
channel model 0.0017922930000000001
igram model 0.001790332
model prob 0.001779442
unigram model 0.0017786940000000002
complex model 0.0017742
efficient model 0.0017593580000000001
case information 0.0016685139999999998
model 0.00156327
news data 0.0015437469999999998
case content 0.0014967399999999999
ing case 0.001489968
data source 0.0014750879999999998
training data 0.0014636340000000001
language models 0.0014436240000000001
evaluation case 0.0014308749999999999
case tokens 0.001428202
first words 0.001413709
news source 0.001390597
frequent case 0.001382766
case con 0.0013809319999999999
case label 0.001378275
first sentence 0.001376324
morphological features 0.001372826
case tag 0.001368163
mixed case 0.001367596
limited case 0.0013517009999999999
case normalization 0.001350568
corresponding case 0.0013410649999999998
case restoration 0.0013401009999999998
true case 0.0013359359999999998
correct case 0.001334773
language modeling 0.001333612
trigram case 0.0013328099999999998
test data 0.001329284
few words 0.001321342
ing text 0.001316112
natural language 0.001316046
likely case 0.00131099
case items 0.0013012
sentence level 0.0012997780000000001
certain words 0.001298391
case disambiguation 0.001298339
unknown words 0.001293013
sentence surface 0.001281106
latter case 0.00127345
case mismatch 0.00127345
icant case 0.00127345
subsequent words 0.001236191
sentence letter 0.001233428
true sentence 0.0012328410000000001
language process 0.001231575
sentence size 0.001225006
tokenized words 0.0012105010000000001
surrounding words 0.001209445
infrequent words 0.0012083900000000002
sentence splitting 0.001201535
trigram language 0.001198542
sentence boundaries 0.001196587
raw text 0.001185618
recognition text 0.001172547
text sources 0.0011676149999999999
text processing 0.001153576
text corpora 0.0011533770000000001
training corpus 0.001153067
context information 0.001137103
translation system 0.001134938
truecased text 0.001134403
quality text 0.001134056
text segment 0.001133917
additional features 0.001123848
same context 0.001120388
large training 0.001114844
regular text 0.001111907
feature space 0.0011108099999999998
current news 0.0011020029999999998
normal text 0.001100653
text readability 0.001100653
ful feature 0.001089317
ace data 0.001082866
token news 0.0010755229999999999
