words words 0.003548
same word 0.003115478
word accuracy 0.003016368
context word 0.002998133
model words 0.00295167
oov word 0.002922571
word type 0.002920307
word types 0.0029106310000000003
many word 0.002900508
rare word 0.0028828770000000003
single word 0.002809231
word tokens 0.0027957430000000003
word count 0.002789169
word sequences 0.002777405
preceding word 0.00277173
rent word 0.002759735
text words 0.002443739
words baseline 0.002425534
same words 0.002362538
training data 0.00233727
tag features 0.002300323
context words 0.002245193
other features 0.002236873
similar words 0.002234168
oov words 0.002169631
rare words 0.002129937
common words 0.002107743
consecutive words 0.002095493
unseen words 0.002073255
test data 0.0020717319999999997
hmm features 0.002027031
words com 0.002020164
data set 0.002016543
smoothing features 0.001935673
such features 0.001930921
baseline model 0.001829204
orthographic features 0.001817823
language model 0.0017833710000000002
words 0.001774
model representation 0.001763637
transition features 0.001746918
features com 0.001735874
boolean features 0.001731936
lsa features 0.001725946
informative features 0.001721448
essential features 0.001720213
crf feature 0.001712114
training set 0.001710733
unlabeled data 0.001709377
feature space 0.001672656
journal data 0.001653392
labeled data 0.001650044
overall data 0.001631142
smoothing model 0.001623633
feature represen 0.0016058280000000001
data sparsity 0.001586971
boolean feature 0.001586096
wsj data 0.00157383
ing model 0.001570692
biomedical data 0.00155949
nal data 0.001557115
oanc data 0.001550854
crf model 0.001545914
pos tag 0.001541559
unknown model 0.00151348
features 0.00148971
language models 0.001489352
size training 0.0014521830000000001
probabilistic model 0.00144619
test set 0.001445195
different domain 0.00143824
hmm models 0.001420972
lsa model 0.001413906
unigram model 0.00141327
scl model 0.001411654
unlabeled training 0.001403567
pos tagging 0.001393739
crf training 0.001383974
training sentences 0.001382245
pos tagger 0.001355675
labeled training 0.0013442340000000001
feature 0.00134387
training sets 0.0013436350000000001
training time 0.00134067
ing models 0.0012766729999999999
baseline tagger 0.0012762630000000001
supervised pos 0.0012720750000000001
annotated training 0.001268548
entropy models 0.001266494
crf models 0.0012518949999999998
enough training 0.001245042
standard pos 0.001237897
baseline system 0.00123368
pos tags 0.001215535
large corpus 0.0012035869999999999
hmm tagging 0.001200114
markov models 0.0011949489999999998
baseline hmm 0.001188855
test distribution 0.001185191
model 0.00117767
