model probability 0.00362323
language model 0.003396689
model standard 0.003304905
network model 0.003197635
baseline model 0.003024349
model parameters 0.0030143469999999997
model performance 0.002984284
model size 0.0029782009999999998
model component 0.002972585
trigram model 0.0029032339999999998
word error 0.002898337
word vocabulary 0.002852405
probabilistic model 0.002821032
input word 0.0028146390000000003
next word 0.0027858830000000003
entropy model 0.002768905
tropy model 0.0027630989999999998
word sequence 0.002761107
guage model 0.002741612
predictor model 0.002738599
model architecture 0.002728506
each model 0.002728223
ist model 0.002723526
word string 0.002698998
unseen word 0.002672648
word clustering 0.0026625940000000003
head word 0.0026513120000000003
word representations 0.002632328
word prefix 0.002598762
model 0.00247344
training data 0.002383424
training neural 0.0023826159999999997
network training 0.0022876249999999997
training corpus 0.002227498
training algorithm 0.002135488
training results 0.002117232
language models 0.002104309
training sen 0.001920598
network models 0.001905255
training procedure 0.001894193
probability function 0.0018895969999999998
training ppl 0.001867269
further training 0.001834724
training criterion 0.00183212
heldout training 0.001831976
training examples 0.001827755
work models 0.00166157
machine translation 0.001639784
preceding words 0.0015691420000000001
training 0.00156343
unknown words 0.001547105
prefix words 0.001545642
neural network 0.001543381
conditional probability 0.001507158
probability estimation 0.0014976899999999999
probability assignment 0.001484913
probability distribution 0.001460287
joint probability 0.0014486719999999998
connectionist models 0.001438375
tionist models 0.0014310920000000001
models fur 0.0014310920000000001
hidden layer 0.001427073
proper probability 0.001425794
probability mass 0.001407106
probability normalization 0.0014066539999999998
probability distribu 0.0014020229999999999
test data 0.001388476
trigram language 0.001353043
layer input 0.001336898
input layer 0.001336898
output layer 0.001325001
data size 0.001324755
words 0.00128117
ing data 0.001280499
network parameters 0.001265102
feature vector 0.001242728
language mod 0.001222353
language modeling 0.0012126139999999999
network output 0.001192647
feature space 0.001187061
structured language 0.001185924
models 0.00118106
neural networks 0.0011808909999999999
translation 0.00118054
labeled data 0.001175092
probabilistic neural 0.0011667779999999998
smoothing results 0.001149878
probability 0.00114979
max layer 0.00112489
softmax layer 0.0011161629999999999
data sparse 0.0011127609999999999
input feature 0.001111297
outputhidden layer 0.0011074499999999998
same problem 0.0011060509999999998
data sparseness 0.00110411
possible headword 0.0011026130000000001
hidden parse 0.001098525
heldout data 0.00108854
smoothing method 0.001085246
likelihood function 0.001085046
