game state 0.00257646
system state 0.002206662
state information 0.002104501
new state 0.001993426
state space 0.001989611
environment state 0.001972622
reward function 0.0018828830000000001
mapping state 0.001865144
large state 0.001854646
goal state 0.0018484229999999999
ment state 0.0018075819999999999
game domain 0.001791721
state spaces 0.001790982
state transitions 0.0017863859999999998
ronment state 0.0017784979999999999
state snapshots 0.001775804
word action 0.001622863
function environment 0.001576372
state 0.00156191
value function 0.001553598
current game 0.001454029
reward policy 0.0014363380000000001
action selection 0.0013954409999999999
action sequence 0.001363861
action space 0.001357146
possible actions 0.0013531020000000001
learning problem 0.001325989
game our 0.001294253
correct action 0.001288328
puzzle game 0.001279466
overall action 0.001271893
learning framework 0.001270553
sentence documents 0.0012620560000000001
game tutorials 0.0012506499999999999
output actions 0.001243875
soccer game 0.001235922
final action 0.0012354
last action 0.001234592
flash game 0.0012292039999999998
zle game 0.0012292039999999998
same sentence 0.001206594
annotated action 0.001202983
different reward 0.001201553
puzzle action 0.0011943610000000001
different word 0.001177748
human user 0.001171881
function 0.00116566
correct actions 0.001165414
next actions 0.001164573
action sequences 0.001159731
current policy 0.001158594
current sentence 0.0011527640000000001
policy distribution 0.001151211
dialogue systems 0.001150259
action doc 0.0011498279999999999
action accu 0.001146167
action sent 0.001143864
action min 0.001143864
notated action 0.001143864
example sentence 0.00114253
reinforcement learning 0.001138579
environment reward 0.001127935
previous actions 0.0011272230000000001
ing approach 0.001119108
dialogue management 0.0011114130000000001
learning problems 0.0011053360000000002
reward value 0.001105161
policy gradient 0.001097713
training document 0.001097256
windows domain 0.001084114
annotated actions 0.001080069
ing system 0.001066807
reward functions 0.001064654
feature representation 0.001064205
learning techniques 0.00106312
system environment 0.0010554639999999999
optimal policy 0.001053469
document set 0.001053226
correct word 0.001052301
positive reward 0.001051014
spurious actions 0.001045089
puzzle domain 0.0010420870000000001
training documents 0.001031646
complete policy 0.001028675
executable actions 0.001026356
immediate reward 0.001026091
system performance 0.0010257859999999999
future reward 0.001025257
crossblock domain 0.001019156
natural language 0.001011151
test documents 0.001007857
word random 0.0010046690000000001
such approaches 9.99114E-4
last word 9.98565E-4
pervised learning 9.95025E-4
language instructions 9.91649E-4
lexical feature 9.91328E-4
challenging learning 9.90725E-4
simple reward 9.90175E-4
document accuracy 9.9006E-4
